Thanos Makatos
2013-Mar-12 12:56 UTC
[PATCH 0 of 4 RFC] blktap3/sring: shared ring between tapdisk and the front-end
This patch series introduces the shared ring used by the front-end to pass request descriptors to tapdisk, as well as responses from tapdisk to the front-end. Requests from this ring end up in tapdisk''s standard request queue. When the tapback daemon detects that the front-end tries to connect to the back-end, it spawns a tapdisk and tells it to connect to the shared ring. The shared ring is created by the tapdisk using the grant references and the event channel port, supplied by tapback. Once the ring is created, tapdisk watches the event channel for notifications. When a notification is received, tapdisk extract the request, parses it, and passes it to the standard tapdisk request queue for processing. Signed-off-by: Thanos Makatos <thanos.makatos@citrix.com>
Thanos Makatos
2013-Mar-12 12:56 UTC
[PATCH 1 of 4 RFC] blktap3/sring: connect to/disconnect from the shared ring
This patch introduces the functions that allows tapdisk to connect to/disconnect from the shared ring. They are message handlers executed when the tapback daemon sends a TAPDISK_MESSAGE_XENBLKIF_CONNECT/DISCONNECT message to the tapdisk, as a result of running the Xenbus protocol. The connection to the ring is effectively established by mapping the grant references and binding to the guest domain''s event channel port (both the grant references and the port are supplied by the tapback daemon). After the connection is established, a callback is registered to be executed when a notification for the ring arrives. Signed-off-by: Thanos Makatos <thanos.makatos@citrix.com> diff -r 1712e6f8bb34 -r 93727c736ff0 tools/blktap3/drivers/sring/blkif.h --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/blktap3/drivers/sring/blkif.h Tue Mar 12 12:54:23 2013 +0000 @@ -0,0 +1,87 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#ifndef __SRING_BLKIF_H__ +#define __SRING_BLKIF_H__ + +#include <xen/io/blkif.h> + +/* Not a real protocol. Used to generate ring structs which contain + * the elements common to all protocols only. This way we get a + * compiler-checkable way to use common struct elements, so we can + * avoid using switch(protocol) in a number of places. */ +struct blkif_common_request { + char dummy; +}; +struct blkif_common_response { + char dummy; +}; + +/* i386 protocol version */ +#pragma pack(push, 4) +struct blkif_x86_32_request { + uint8_t operation; /* BLKIF_OP_??? */ + uint8_t nr_segments; /* number of segments */ + blkif_vdev_t handle; /* only for read/write requests */ + uint64_t id; /* private guest value, echoed in resp */ + blkif_sector_t sector_number; /* start sector idx on disk (r/w only) */ + struct blkif_request_segment seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; +}; +struct blkif_x86_32_response { + uint64_t id; /* copied from request */ + uint8_t operation; /* copied from request */ + int16_t status; /* BLKIF_RSP_??? */ +}; +typedef struct blkif_x86_32_request blkif_x86_32_request_t; +typedef struct blkif_x86_32_response blkif_x86_32_response_t; +#pragma pack(pop) + +/* x86_64 protocol version */ +struct blkif_x86_64_request { + uint8_t operation; /* BLKIF_OP_??? */ + uint8_t nr_segments; /* number of segments */ + blkif_vdev_t handle; /* only for read/write requests */ + uint64_t __attribute__ ((__aligned__(8))) id; + blkif_sector_t sector_number; /* start sector idx on disk (r/w only) */ + struct blkif_request_segment seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; +}; +struct blkif_x86_64_response { + uint64_t __attribute__ ((__aligned__(8))) id; + uint8_t operation; /* copied from request */ + int16_t status; /* BLKIF_RSP_??? */ +}; +typedef struct blkif_x86_64_request blkif_x86_64_request_t; +typedef struct blkif_x86_64_response blkif_x86_64_response_t; + +DEFINE_RING_TYPES(blkif_common, struct blkif_common_request, + struct blkif_common_response); +DEFINE_RING_TYPES(blkif_x86_32, struct blkif_x86_32_request, + struct blkif_x86_32_response); +DEFINE_RING_TYPES(blkif_x86_64, struct blkif_x86_64_request, + struct blkif_x86_64_response); + +union blkif_back_rings { + blkif_back_ring_t native; + blkif_common_back_ring_t common; + blkif_x86_32_back_ring_t x86_32; + blkif_x86_64_back_ring_t x86_64; +}; +typedef union blkif_back_rings blkif_back_rings_t; + +#endif /* __SRING_BLKIF_H__ */ diff -r 1712e6f8bb34 -r 93727c736ff0 tools/blktap3/drivers/sring/td-blkif.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/blktap3/drivers/sring/td-blkif.c Tue Mar 12 12:54:23 2013 +0000 @@ -0,0 +1,224 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#include <assert.h> +#include <stdlib.h> +#include <errno.h> +#include <syslog.h> +#include <sys/mman.h> + +#include "blktap3.h" +#include "tapdisk.h" + +#include "td-blkif.h" +#include "td-ctx.h" +#include "td-req.h" + +struct td_xenblkif * +tapdisk_xenblkif_find(const domid_t domid, const int devid) +{ + struct td_xenblkif *blkif = NULL; + struct td_xenio_ctx *ctx; + + tapdisk_xenio_for_each_ctx(ctx) { + tapdisk_xenio_ctx_find_blkif(ctx, blkif, + blkif->domid == domid && + blkif->devid == devid); + if (blkif) + return blkif; + } + + return NULL; +} + +void +tapdisk_xenblkif_destroy(struct td_xenblkif * blkif) +{ + assert(blkif); + + tapdisk_xenblkif_reqs_free(blkif); + + if (blkif->ctx) { + if (blkif->port >= 0) + xc_evtchn_unbind(blkif->ctx->xce_handle, blkif->port); + + if (blkif->rings.common.sring) + xc_gnttab_munmap(blkif->ctx->xcg_handle, blkif->rings.common.sring, + blkif->ring_n_pages); + + TAILQ_REMOVE(&blkif->ctx->blkifs, blkif, entry); + tapdisk_xenio_ctx_put(blkif->ctx); + } + + free(blkif); +} + +int +tapdisk_xenblkif_disconnect(const domid_t domid, const int devid) +{ + struct td_xenblkif *blkif; + + blkif = tapdisk_xenblkif_find(domid, devid); + if (!blkif) + return ESRCH; + + if (blkif->n_reqs_free != blkif->ring_size) + return EBUSY; + + tapdisk_xenblkif_destroy(blkif); + + return 0; +} + +int +tapdisk_xenblkif_connect(domid_t domid, int devid, const grant_ref_t * grefs, + int order, evtchn_port_t port, int proto, const char *pool, + td_vbd_t * vbd) +{ + struct td_xenblkif *td_blkif = NULL; + struct td_xenio_ctx *td_ctx; + int err; + int i; + void *sring; + size_t sz; + + /* + * Already connected? + */ + if (tapdisk_xenblkif_find(domid, devid)) { + /* TODO log error */ + return EEXIST; + } + + err = tapdisk_xenio_ctx_get(pool, &td_ctx); + if (err) { + /* TODO log error */ + goto fail; + } + + td_blkif = calloc(1, sizeof(*td_blkif)); + if (!td_blkif) { + /* TODO log error */ + err = errno; + goto fail; + } + + td_blkif->domid = domid; + td_blkif->devid = devid; + td_blkif->vbd = vbd; + td_blkif->ctx = td_ctx; + td_blkif->proto = proto; + + /* + * Create the shared ring. + */ + td_blkif->ring_n_pages = 1 << order; + if (td_blkif->ring_n_pages > ARRAY_SIZE(td_blkif->ring_ref)) { + syslog(LOG_ERR, "too many pages (%d), max %lu\n", + td_blkif->ring_n_pages, ARRAY_SIZE(td_blkif->ring_ref)); + err = EINVAL; + goto fail; + } + + /* + * TODO Why don''t we just keep a copy of the array''s address? There should + * be a reason for copying the addresses of the pages, figure out why. + * TODO Why do we even store it in the td_blkif in the first place? + */ + for (i = 0; i < td_blkif->ring_n_pages; i++) + td_blkif->ring_ref[i] = grefs[i]; + + /* + * Map the grant references that will be holding the request descriptors. + */ + sring = xc_gnttab_map_domain_grant_refs(td_blkif->ctx->xcg_handle, + td_blkif->ring_n_pages, td_blkif->domid, td_blkif->ring_ref, + PROT_READ | PROT_WRITE); + if (!sring) { + err = errno; + syslog(LOG_ERR, "failed to map domain''s %d grant references: %s\n", + domid, strerror(err)); + goto fail; + } + + /* + * Size of the ring, in bytes. + */ + sz = XC_PAGE_SIZE << order; + + /* + * Initialize the mapped address into the shared ring. + * + * TODO Check for protocol support in the beginning of this function. + */ + switch (td_blkif->proto) { + case BLKIF_PROTOCOL_NATIVE: + { + blkif_sring_t *__sring = sring; + BACK_RING_INIT(&td_blkif->rings.native, __sring, sz); + break; + } + case BLKIF_PROTOCOL_X86_32: + { + blkif_x86_32_sring_t *__sring = sring; + BACK_RING_INIT(&td_blkif->rings.x86_32, __sring, sz); + break; + } + case BLKIF_PROTOCOL_X86_64: + { + blkif_x86_64_sring_t *__sring = sring; + BACK_RING_INIT(&td_blkif->rings.x86_64, __sring, sz); + break; + } + default: + syslog(LOG_ERR, "unsupported protocol 0x%x\n", td_blkif->proto); + err = EPROTONOSUPPORT; + goto fail; + } + + /* + * Bind to the remote port. + * FIXME elaborate + */ + td_blkif->port = xc_evtchn_bind_interdomain(td_blkif->ctx->xce_handle, + td_blkif->domid, port); + if (td_blkif->port == -1) { + err = errno; + syslog(LOG_ERR, "failed to bind to event channel port %d of domain " + "%d: %s\n", port, td_blkif->domid, strerror(err)); + goto fail; + } + + err = tapdisk_xenblkif_reqs_init(td_blkif); + if (err) { + /* TODO log error */ + goto fail; + } + + TAILQ_INSERT_TAIL(&td_ctx->blkifs, td_blkif, entry); + + return 0; + +fail: + if (td_blkif) + tapdisk_xenblkif_destroy(td_blkif); + + return err; +} + diff -r 1712e6f8bb34 -r 93727c736ff0 tools/blktap3/drivers/sring/td-blkif.h --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/blktap3/drivers/sring/td-blkif.h Tue Mar 12 12:54:23 2013 +0000 @@ -0,0 +1,164 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#ifndef __TD_BLKIF_H__ +#define __TD_BLKIF_H__ + +#include <inttypes.h> /* TODO required by xen/event_channel.h */ +#include <xen/event_channel.h> + +#include "blkif.h" +#include "td-req.h" +#include "td-stats.h" + +struct td_xenio_ctx; + +struct td_xenblkif { + + /** + * The domain ID where the front-end is running. + */ + int domid; + + /** + * The device ID of the VBD. + */ + int devid; + + + /** + * Pointer to the context this block interface belongs to. + */ + struct td_xenio_ctx *ctx; + + /** + * for linked lists. + */ + TAILQ_ENTRY(td_xenblkif) entry; + + /** + * The local port corresponding to the remote port of the domain where the + * front-end is running. We use this to tell for which VBD a pending event + * is, and for notifying the front-end for responses we have produced and + * placed in the shared ring. + */ + evtchn_port_t port; + + /** + * protocol (native, x86, or x64) + * Need to keep around? Replace with function pointer? + */ + int proto; + + blkif_back_rings_t rings; + + /** + * TODO Why 8 specifically? + * TODO Do we really need to keep it around? + */ + grant_ref_t ring_ref[8]; + + /** + * Number of pages in the ring that holds the request descriptors. + */ + int ring_n_pages; + + /* + * Size of the ring, expressed in requests. + * TODO Do we really need to keep this around? + */ + int ring_size; + + /** + * Intermediate requests. The array is managed as a stack, with n_reqs_free + * pointing to the top of the stack, at the next available intermediate + * request. + */ + struct td_xenblkif_req *reqs; + + /** + * Stack pointer to the aforementioned stack. + */ + int n_reqs_free; + + blkif_request_t **reqs_free; + + /* + * Pointer to the actual VBD. + */ + td_vbd_t *vbd; + + /* + * stats + */ + struct td_xenblkif_stats stats; +}; + +#define tapdisk_xenio_for_each_ctx(_ctx) \ + TAILQ_FOREACH(_ctx, &_td_xenio_ctxs, entry) + +/** + * Connects the tapdisk to the shared ring. + * + * @param domid the ID of the guest domain + * @param devid the device ID + * @param grefs the grant references + * @param order number of grant references + * @param port event channel port of the guest domain to use for ring + * notifications + * @param proto protocol (native, x86, or x64) + * @param pool name of the context + * @param vbd the VBD + * @returns 0 on success + */ +int +tapdisk_xenblkif_connect(domid_t domid, int devid, const grant_ref_t * grefs, + int order, evtchn_port_t port, int proto, const char *pool, + td_vbd_t * vbd); + +/** + * Disconnects the tapdisk from the shared ring. + * + * @param domid the domain ID of the guest domain + * @param devid the device ID of the VBD + * @returns 0 on success + */ +int +tapdisk_xenblkif_disconnect(const domid_t domid, const int devid); + +/** + * Destroys a XEN block interface. + * + * @param blkif the block interface to destroy + */ +void +tapdisk_xenblkif_destroy(struct td_xenblkif * blkif); + +/** + * Searches all block interfaces in all contexts for a block interface + * having the specified domain and device ID. + * + * @param domid the domain ID + * @param devid the device ID + * @returns a pointer to the block interface if found, else NULL + */ +struct td_xenblkif * +tapdisk_xenblkif_find(const domid_t domid, const int devid); + +#endif /* __TD_BLKIF_H__ */
Thanos Makatos
2013-Mar-12 12:56 UTC
[PATCH 2 of 4 RFC] blktap3/sring: extract requests from the ring, grouping of VBDs
This patch introduces the functions that extract requests from the shared ring. When a VBD is created, the file descriptor of the event channel is added to the set of the file descriptors watched by the main select(). When a notification arrives, the callback associated with it extracts requests from the ring. Parsing of the requests and passing them to the tapdisk standard request queue is introduced by another patch in this series. Also, there is another piece of functionality introduced for which I have some questions: two or more VBDs served by the same tapdisk process can be grouped into a "context". This context carries the handle to the event channel driver used to bind to the guest domain''s port in order to receive/send notifications for the shared ring. I don''t see any merit by this grouping and I think it should be removed, is there something I am overseeing? Signed-off-by: Thanos Makatos <thanos.makatos@citrix.com> diff -r 93727c736ff0 -r 8a9e0b69b44a tools/blktap3/drivers/sring/td-ctx.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/blktap3/drivers/sring/td-ctx.c Tue Mar 12 12:55:57 2013 +0000 @@ -0,0 +1,442 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#include <assert.h> +#include <errno.h> +#include <syslog.h> + +#include "tapdisk-server.h" + +#include "td-ctx.h" + +struct tqh_td_xenio_ctx _td_xenio_ctxs + = TAILQ_HEAD_INITIALIZER(_td_xenio_ctxs); + +/** + * FIXME releases a pool? + */ +static void +tapdisk_xenio_ctx_close(struct td_xenio_ctx * const ctx) +{ + assert(ctx); + + if (ctx->ring_event >= 0) { + tapdisk_server_unregister_event(ctx->ring_event); + ctx->ring_event = -1; + } + + if (ctx->xce_handle) { + xc_evtchn_close(ctx->xce_handle); + ctx->xce_handle = NULL; + } + + if (ctx->xcg_handle) { + xc_evtchn_close(ctx->xcg_handle); + ctx->xcg_handle = NULL; + } + + TAILQ_REMOVE(&_td_xenio_ctxs, ctx, entry); + + /* FIXME when do we free it? */ +} + +/* + * XXX only called by tapdisk_xenio_ctx_ring_event + */ +static inline struct td_xenblkif * +xenio_pending_blkif(struct td_xenio_ctx * const ctx) +{ + evtchn_port_or_error_t port; + struct td_xenblkif *blkif; + int err; + + assert(ctx); + + /* + * Get the local port for which there is a pending event. + */ + port = xc_evtchn_pending(ctx->xce_handle); + if (port == -1) { + /* TODO log error */ + return NULL; + } + + /* + * Find the block interface with that local port. + */ + tapdisk_xenio_ctx_find_blkif(ctx, blkif, blkif->port == port); + if (blkif) { + err = xc_evtchn_unmask(ctx->xce_handle, port); + if (err) { + /* TODO log error */ + return NULL; + } + } + /* + * TODO Is it possible to have an pending event channel but no block + * interface associated with it? + */ + + return blkif; +} + +#define blkif_get_req(dst, src) \ +{ \ + int i, n = BLKIF_MAX_SEGMENTS_PER_REQUEST; \ + dst->operation = src->operation; \ + dst->nr_segments = src->nr_segments; \ + dst->handle = src->handle; \ + dst->id = src->id; \ + dst->sector_number = src->sector_number; \ + xen_rmb(); \ + if (n > dst->nr_segments) \ + n = dst->nr_segments; \ + for (i = 0; i < n; i++) \ + dst->seg[i] = src->seg[i]; \ +} + +/** + * Utility function that retrieves a request using @idx as the ring index, + * copying it to the @dst in a H/W independent way. + * + * @param blkif the block interface + * @param dst address that receives the request + * @param rc the index of the request in the ring + */ +static inline void +xenio_blkif_get_request(struct td_xenblkif * const blkif, + blkif_request_t *const dst, const RING_IDX idx) +{ + blkif_back_rings_t * rings; + + assert(blkif); + assert(dst); + + rings = &blkif->rings; + + switch (blkif->proto) { + case BLKIF_PROTOCOL_NATIVE: + { + blkif_request_t *src; + src = RING_GET_REQUEST(&rings->native, idx); + memcpy(dst, src, sizeof(blkif_request_t)); + break; + } + + case BLKIF_PROTOCOL_X86_32: + { + blkif_x86_32_request_t *src; + src = RING_GET_REQUEST(&rings->x86_32, idx); + blkif_get_req(dst, src); + break; + } + + case BLKIF_PROTOCOL_X86_64: + { + blkif_x86_64_request_t *src; + src = RING_GET_REQUEST(&rings->x86_64, idx); + blkif_get_req(dst, src); + break; + } + + default: + /* + * TODO log error + */ + assert(0); + } +} + +/** + * Retrieves at most @count request descriptors from the ring, copying them to + * @reqs. + * + * @param blkif the block interface + * @param reqs array of pointers where each element points to sufficient memory + * space that receives each request descriptor + * @param count retrieve at most that many request descriptors + * @returns the number of retrieved request descriptors + * + * XXX only called by xenio_blkif_get_requests + */ +static inline int +__xenio_blkif_get_requests(struct td_xenblkif * const blkif, + blkif_request_t *reqs[], const unsigned int count) +{ + blkif_common_back_ring_t * ring; + RING_IDX rp, rc; + int n; + + assert(blkif); + assert(reqs); + + if (!count) + return 0; + + ring = &blkif->rings.common; + + rp = ring->sring->req_prod; + xen_rmb(); /* TODO why? */ + + for (rc = ring->req_cons, n = 0; rc != rp; rc++) { + blkif_request_t *dst = reqs[n]; + + if (n++ >= count) + break; + + xenio_blkif_get_request(blkif, dst, rc); + } + + ring->req_cons = rc; + + return n; +} + +/** + * Retrieves at most @count request descriptors. + * + * @param blkif the block interface + * @param reqs array of pointers where each pointer points to sufficient + * memory to hold a request descriptor + * @count maximum number of request descriptors to retrieve + * @param final re-enable notifications before it stops reading + * @returns the number of request descriptors retrieved + * + * TODO change name + */ +static inline int +xenio_blkif_get_requests(struct td_xenblkif * const blkif, + blkif_request_t *reqs[], const int count, const int final) +{ + blkif_common_back_ring_t * ring; + int n = 0; + int work = 0; + + assert(blkif); + assert(reqs); + assert(count > 0); + + ring = &blkif->rings.common; + + do { + if (final) + RING_FINAL_CHECK_FOR_REQUESTS(ring, work); + else + work = RING_HAS_UNCONSUMED_REQUESTS(ring); + + if (!work) + break; + + if (n >= count) + break; + + n += __xenio_blkif_get_requests(blkif, reqs + n, count - n); + } while (1); + + return n; +} + +/** + * Callback executed when there is a request descriptor in the ring. Copies as + * many request descriptors as possible (limited by local buffer space) to the + * td_blkif''s local request buffer and queues them to the tapdisk queue. + */ +static inline void +tapdisk_xenio_ctx_ring_event(event_id_t id __attribute__((unused)), + char mode __attribute__((unused)), void *private) +{ + struct td_xenio_ctx *ctx = private; + struct td_xenblkif *blkif = NULL; + int n_reqs; + int final = 0; + int start; + blkif_request_t **reqs; + + assert(ctx); + + blkif = xenio_pending_blkif(ctx); + if (!blkif) { + /* TODO log error */ + return; + } + + start = blkif->n_reqs_free; + blkif->stats.kicks.in++; + + /* + * In each iteration, copy as many request descriptors from the shared ring + * that can fit in the td_blkif''s buffer. + */ + do { + reqs = &blkif->reqs_free[blkif->ring_size - blkif->n_reqs_free]; + + assert(reqs); + + n_reqs = xenio_blkif_get_requests(blkif, reqs, blkif->n_reqs_free, + final); + assert(n_reqs >= 0); + if (!n_reqs) + break; + + blkif->n_reqs_free -= n_reqs; + final = 1; + + } while (1); + + n_reqs = start - blkif->n_reqs_free; + if (!n_reqs) + /* TODO If there are no requests to be copied, why was there a + * notification in the first place? + */ + return; + blkif->stats.reqs.in += n_reqs; + reqs = &blkif->reqs_free[blkif->ring_size - start]; + tapdisk_xenblkif_queue_requests(blkif, reqs, n_reqs); +} + +/* NB. may be NULL, but then the image must be bouncing I/O */ +#define TD_XENBLKIF_DEFAULT_POOL "td-xenio-default" + +/** + * Opens a context on the specified pool. + * + * @param pool the pool, it can either be NULL or a non-zero length string + * @returns 0 in success + * + * FIXME The pool is ignored, we always open the default pool. + */ +static inline int +tapdisk_xenio_ctx_open(const char *pool) +{ + struct td_xenio_ctx *ctx; + int fd, err; + + /* zero-length pool names are not allowed */ + if (pool && !strlen(pool)) + return EINVAL; + + ctx = calloc(1, sizeof(*ctx)); + if (!ctx) { + err = errno; + syslog(LOG_ERR, "cannot allocate memory"); + goto fail; + } + + ctx->ring_event = -1; /* TODO is there a special value? */ + ctx->pool = TD_XENBLKIF_DEFAULT_POOL; + TAILQ_INIT(&ctx->blkifs); + TAILQ_INSERT_HEAD(&_td_xenio_ctxs, ctx, entry); + + ctx->xce_handle = xc_evtchn_open(NULL, 0); + if (ctx->xce_handle == NULL) { + err = errno; + syslog(LOG_ERR, "failed to open the event channel driver: %s\n", + strerror(err)); + goto fail; + } + + ctx->xcg_handle = xc_gnttab_open(NULL, 0); + if (ctx->xcg_handle == NULL) { + err = errno; + syslog(LOG_ERR, "failed to open the grant table driver: %s\n", + strerror(err)); + goto fail; + } + + fd = xc_evtchn_fd(ctx->xce_handle); + if (fd < 0) { + err = errno; + syslog(LOG_ERR, "failed to get the event channel file descriptor: %s\n", + strerror(err)); + goto fail; + } + + ctx->ring_event = tapdisk_server_register_event(SCHEDULER_POLL_READ_FD, + fd, 0, tapdisk_xenio_ctx_ring_event, ctx); + if (ctx->ring_event < 0) { + err = -ctx->ring_event; + syslog(LOG_ERR, "failed to register event: %s\n", strerror(err)); + goto fail; + } + + return 0; + +fail: + tapdisk_xenio_ctx_close(ctx); + return err; +} + + +/** + * Tells whether @ctx belongs to @pool. + * + * If no @pool is not specified and a default pool is set, @ctx is compared + * against the default pool. Note that NULL is valid pool name value. + */ +static inline int +__td_xenio_ctx_match(struct td_xenio_ctx * ctx, const char *pool) +{ + if (unlikely(!pool)) { + if (NULL != TD_XENBLKIF_DEFAULT_POOL) + return !strcmp(ctx->pool, TD_XENBLKIF_DEFAULT_POOL); + else + return !ctx->pool; + } + + return !strcmp(ctx->pool, pool); +} + +#define tapdisk_xenio_find_ctx(_ctx, _cond) \ + do { \ + int found = 0; \ + tapdisk_xenio_for_each_ctx(_ctx) { \ + if (_cond) { \ + found = 1; \ + break; \ + } \ + } \ + if (!found) \ + _ctx = NULL; \ + } while (0) + +int +tapdisk_xenio_ctx_get(const char *pool, struct td_xenio_ctx ** _ctx) +{ + struct td_xenio_ctx *ctx; + int err = 0; + + do { + tapdisk_xenio_find_ctx(ctx, __td_xenio_ctx_match(ctx, pool)); + if (ctx) { + *_ctx = ctx; + return 0; + } + + err = tapdisk_xenio_ctx_open(pool); + } while (!err); + + return err; +} + +void +tapdisk_xenio_ctx_put(struct td_xenio_ctx * ctx) +{ + if (TAILQ_EMPTY(&ctx->blkifs)) + tapdisk_xenio_ctx_close(ctx); +} diff -r 93727c736ff0 -r 8a9e0b69b44a tools/blktap3/drivers/sring/td-ctx.h --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/blktap3/drivers/sring/td-ctx.h Tue Mar 12 12:55:57 2013 +0000 @@ -0,0 +1,102 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#ifndef __TD_CTX_H__ +#define __TD_CTX_H__ + +#include "td-blkif.h" +#include <xenctrl.h> +#include "scheduler.h" + +TAILQ_HEAD(tqh_td_xenblkif, td_xenblkif); + +/** + * A VBD context: groups two or more VBDs of the same tapdisk. + */ +struct td_xenio_ctx { + char *pool; /* TODO rename to pool_name */ + + /** + * Handle to the grant table driver. + */ + xc_gnttab *xcg_handle; + + /** + * Handle to the event channel driver. + */ + xc_evtchn *xce_handle; + + /** + * Return value of tapdisk_server_register_event, we use this to tell + * whether the context is registered. + */ + event_id_t ring_event; + + /** + * block interfaces in this pool + */ + struct tqh_td_xenblkif blkifs; + + /** + * for linked lists + */ + TAILQ_ENTRY(td_xenio_ctx) entry; +}; + +/** + * Retrieves the context corresponding to the specified pool name, creating it + * if it doesn''t already exist. + */ +int +tapdisk_xenio_ctx_get(const char *pool, struct td_xenio_ctx ** _ctx); + +/** + * Releases the pool, only if there is no block interface using it. + */ +void +tapdisk_xenio_ctx_put(struct td_xenio_ctx * ctx); + +/** + * List of contexts. + */ +extern TAILQ_HEAD(tqh_td_xenio_ctx, td_xenio_ctx) _td_xenio_ctxs; + +/** + * For each block interface of this context... + */ +#define tapdisk_xenio_for_each_blkif(_blkif, _ctx) \ + TAILQ_FOREACH(_blkif, &(_ctx)->blkifs, entry) + +/** + * Search this context for the block interface for which the condition is true. + */ +#define tapdisk_xenio_ctx_find_blkif(_ctx, _blkif, _cond) \ + do { \ + int found = 0; \ + tapdisk_xenio_for_each_blkif(_blkif, _ctx) { \ + if (_cond) { \ + found = 1; \ + break; \ + } \ + } \ + if (!found) \ + _blkif = NULL; \ + } while (0) + +#endif /* __TD_CTX_H__ */
Thanos Makatos
2013-Mar-12 12:56 UTC
[PATCH 3 of 4 RFC] blktap3/sring: parse ring requests and handle them over to tapdisk
This patch introduces the functionality of parsing shared ring requests and adding them to tapdisk''s standard request queue. Also, it provides the functionality of producing responses. Signed-off-by: Thanos Makatos <thanos.makatos@citrix.com> diff -r 8a9e0b69b44a -r 8e86ca826085 tools/blktap3/drivers/sring/td-req.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/blktap3/drivers/sring/td-req.c Tue Mar 12 12:56:01 2013 +0000 @@ -0,0 +1,505 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#include "td-req.h" +#include "td-blkif.h" +#include <assert.h> +#include <stdlib.h> +#include <errno.h> +#include "td-ctx.h" +#include <syslog.h> +#include <inttypes.h> +#include "tapdisk-vbd.h" + +/* + * TODO needed for PROT_READ/PROT_WRITE, probably some xen header supplies them + * too + */ +#include <sys/mman.h> + +/** + * Puts the request back to the free list of this block interface. + * + * @param blkif the block interface + * @param tapreq the request to give back + */ +static void +tapdisk_xenblkif_free_request(struct td_xenblkif * const blkif, + struct td_xenblkif_req * const tapreq) +{ + assert(blkif); + assert(tapreq); + assert(blkif->n_reqs_free <= blkif->ring_size); + + blkif->reqs_free[blkif->ring_size - (++blkif->n_reqs_free)] = &tapreq->msg; +} + +/** + * Returns the size, in request descriptors, of the shared ring + * + * @param blkif the block interface + * @returns the size, in request descriptors, of the shared ring + */ +static int +td_blkif_ring_size(const struct td_xenblkif * const blkif) +{ + assert(blkif); + + switch (blkif->proto) { + case BLKIF_PROTOCOL_NATIVE: + return RING_SIZE(&blkif->rings.native); + + case BLKIF_PROTOCOL_X86_32: + return RING_SIZE(&blkif->rings.x86_32); + + case BLKIF_PROTOCOL_X86_64: + return RING_SIZE(&blkif->rings.x86_64); + + default: + return -EPROTONOSUPPORT; + } +} + +/** + * Unmaps a request''s data. + * + * @param blkif the block interface the request belongs to + * @param req the request to unmap + */ +static int +xenio_blkif_munmap_one(struct td_xenblkif * const blkif, + struct td_xenblkif_req * const req) +{ + struct td_xenio_ctx *ctx; + int err; + + assert(blkif); + assert(req); + + ctx = blkif->ctx; + assert(ctx); + + err = xc_gnttab_munmap(ctx->xcg_handle, req->vma, req->nr_segments); + if (err) { + err = errno; + /* TODO don''t use syslog for error on the data path */ + syslog(LOG_ERR, "failed to unmap pages: %s\n", strerror(err)); + return err; + } + + req->vma = NULL; + return 0; +} + +/** + * Get the response that corresponds to the specified ring index in a H/W + * independent way. + * + * TODO use function pointers instead of switch + * XXX only called by xenio_blkif_put_response + */ +static inline +blkif_response_t *xenio_blkif_get_response(struct td_xenblkif* const blkif, + const RING_IDX rp) +{ + blkif_back_rings_t * const rings = &blkif->rings; + blkif_response_t * p = NULL; + + switch (blkif->proto) { + case BLKIF_PROTOCOL_NATIVE: + p = (blkif_response_t *) RING_GET_RESPONSE(&rings->native, rp); + break; + case BLKIF_PROTOCOL_X86_32: + p = (blkif_response_t *) RING_GET_RESPONSE(&rings->x86_32, rp); + break; + case BLKIF_PROTOCOL_X86_64: + p = (blkif_response_t *) RING_GET_RESPONSE(&rings->x86_64, rp); + break; + default: + /* TODO gracefully fail? */ + abort(); + } + + return p; +} + +/** + * Puts a response in the ring. + * + * @param blkif the VBD + * @param req the request for which the response should be put + * @param status the status of the response (success or an error code) + * @param final controls whether the front-end will be notified, if necessary + * + * FIXME @req can be NULL so the function will only notify the other end. This + * is used in the error path of tapdisk_xenblkif_queue_requests. The point is + * that the other will just be notified, does this make sense? + */ +static int +xenio_blkif_put_response(struct td_xenblkif * const blkif, + struct td_xenblkif_req *req, int const status, int const final) +{ + blkif_common_back_ring_t * const ring = &blkif->rings.common; + + if (req) { + blkif_response_t * msg = xenio_blkif_get_response(blkif, + ring->rsp_prod_pvt); + assert(msg); + + assert(status == BLKIF_RSP_EOPNOTSUPP || status == BLKIF_RSP_ERROR + || status == BLKIF_RSP_OKAY); + + msg->id = req->id; + + /* FIXME Why do we have to set this? */ + msg->operation = req->op; + + msg->status = status; + + ring->rsp_prod_pvt++; + } + + if (final) { + int notify; + RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(ring, notify); + if (notify) { + int err = xc_evtchn_notify(blkif->ctx->xce_handle, blkif->port); + if (err < 0) { + /* TODO log error */ + return errno; + } + } + } + + return 0; +} + +/** + * Completes a request. + * + * @blkif the VBD the request belongs belongs to + * @tapreq the request to complete + * @error completion status of the request + * @final controls whether the other end should be notified + */ +static void +tapdisk_xenblkif_complete_request(struct td_xenblkif * const blkif, + struct td_xenblkif_req* tapreq, const int error, const int final) +{ + assert(blkif); + assert(tapreq); + + if (tapreq->vma) { + int err = 0; + err = xenio_blkif_munmap_one(blkif, tapreq); + /* FIXME error handling */ + } + + xenio_blkif_put_response(blkif, tapreq, + (error ? BLKIF_RSP_ERROR : BLKIF_RSP_OKAY), final); + + tapdisk_xenblkif_free_request(blkif, tapreq); + + blkif->stats.reqs.out++; + if (final) + blkif->stats.kicks.out++; +} + +/** + * Request completion callback, executed when the tapdisk has finished + * processing the request. + * + * @param vreq the completed request + * @param error status of the request + * @param token token previously associated with this request + * @param final TODO ? + */ +static inline void +__tapdisk_xenblkif_request_cb(struct td_vbd_request * const vreq, + const int error, void * const token, const int final) +{ + struct td_xenblkif_req *tapreq; + struct td_xenblkif * const blkif = token; + + assert(vreq); + assert(blkif); + + tapreq = containerof(vreq, struct td_xenblkif_req, vreq); + + tapdisk_xenblkif_complete_request(blkif, tapreq, error, final); + if (error) + blkif->stats.errors.img++; +} + +/** + * Initialises the standard tapdisk request (td_vbd_request_t) from the + * intermediate ring request (td_xenblkif_req) in order to prepare it + * processing. + * + * @param blkif the block interface + * @param tapreq the request to prepare + * @returns 0 on success + * + * TODO only called by tapdisk_xenblkif_queue_request + */ +static inline int +tapdisk_xenblkif_make_vbd_request(struct td_xenblkif * const blkif, + struct td_xenblkif_req * const tapreq) +{ + td_vbd_request_t *vreq; + int i; + struct td_iovec *iov; + void *page, *next, *last; + int prot; + grant_ref_t gref[BLKIF_MAX_SEGMENTS_PER_REQUEST]; + + assert(tapreq); + + vreq = &tapreq->vreq; + assert(vreq); + + switch (tapreq->msg.operation) { + case BLKIF_OP_READ: + tapreq->op = BLKIF_OP_READ; + vreq->op = TD_OP_READ; + prot = PROT_WRITE; + break; + case BLKIF_OP_WRITE: + tapreq->op = BLKIF_OP_WRITE; + vreq->op = TD_OP_WRITE; + prot = PROT_READ; + break; + default: + /* TODO log error */ + return EOPNOTSUPP; + } + + /* TODO there should be at least one segment, right? */ + if (tapreq->msg.nr_segments < 1 + || tapreq->msg.nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST) { + /* TODO log error */ + return EINVAL; + } + + for (i = 0; i < tapreq->msg.nr_segments; i++) { + struct blkif_request_segment *seg = &tapreq->msg.seg[i]; + gref[i] = seg->gref; + + /* + * Note that first and last may be equal, which means only one sector + * must be transferred. + */ + if (seg->last_sect < seg->first_sect) { + /* TODO log error */ + return EINVAL; + } + } + + tapreq->nr_segments = tapreq->msg.nr_segments; + + /* + * Map the request''s data. + */ + tapreq->vma = xc_gnttab_map_domain_grant_refs(blkif->ctx->xcg_handle, + tapreq->nr_segments, blkif->domid, gref, prot); + if (!tapreq->vma) { + /* TODO log error */ + return errno; + } + + tapreq->id = tapreq->msg.id; + + /* + * Vectorizes the request: creates the struct iovec (in tapreq->iov) that + * describes each segment to be transferred. Also, merges consecutive + * segments. + * TODO The following piece of code would be much simpler if we didn''t + * merge segments, right? + * + * In each loop, iov points to the previous scatter/gather element in order + * to reuse it if the current and previous segments are consecutive. + */ + iov = tapreq->iov - 1; + last = NULL; + page = tapreq->vma; + + for (i = 0; i < tapreq->nr_segments; i++) { /* for each segment */ + struct blkif_request_segment *seg = &tapreq->msg.seg[i]; + size_t size; + + /* TODO check that first_sect/last_sect are within page */ + + next = page + (seg->first_sect << SECTOR_SHIFT); + size = seg->last_sect - seg->first_sect + 1; + + if (next != last) { + iov++; + iov->base = next; + iov->secs = size; + } else /* The "else" is true if fist_sect is 0. */ + iov->secs += size; + + last = iov->base + (iov->secs << SECTOR_SHIFT); + page += XC_PAGE_SIZE; + } + + vreq->iov = tapreq->iov; + vreq->iovcnt = iov - tapreq->iov + 1; + vreq->sec = tapreq->msg.sector_number; + + /* + * TODO Isn''t this kind of expensive to do for each requests? Why does the + * tapdisk need this in the first place? + */ + snprintf(tapreq->name, sizeof(tapreq->name), "xenvbd-%d-%d.%"SCNx64"", + blkif->domid, blkif->devid, tapreq->msg.id); + + vreq->name = tapreq->name; + vreq->token = blkif; + vreq->cb = __tapdisk_xenblkif_request_cb; + + return 0; +} + +#define msg_to_tapreq(_req) \ + containerof(_req, struct td_xenblkif_req, msg) + +/** + * Queues a ring request, after it prepares it, to the standard taodisk queue + * for processing. + * + * @param blkif the block interface + * @param msg the ring request + * @param tapreq the intermediate request + * + * TODO don''t really need to supply the ring request since it''s either way + * contained in the tapreq + * + * XXX only called by tapdisk_xenblkif_queue_requests + */ +static inline int +tapdisk_xenblkif_queue_request(struct td_xenblkif * const blkif, + blkif_request_t *msg, struct td_xenblkif_req *tapreq) +{ + int err; + + assert(blkif); + assert(msg); + assert(tapreq); + + err = tapdisk_xenblkif_make_vbd_request(blkif, tapreq); + if (err) { + /* TODO log error */ + blkif->stats.errors.map++; + return err; + } + + err = tapdisk_vbd_queue_request(blkif->vbd, &tapreq->vreq); + if (err) { + /* TODO log error */ + blkif->stats.errors.vbd++; + return err; + } + + return 0; +} + +void +tapdisk_xenblkif_queue_requests(struct td_xenblkif * const blkif, + blkif_request_t *reqs[], const int nr_reqs) +{ + int i; + int err; + int nr_errors = 0; + + assert(blkif); + assert(reqs); + assert(nr_reqs >= 0); + + for (i = 0; i < nr_reqs; i++) { /* for each request from the ring... */ + blkif_request_t *msg = reqs[i]; + struct td_xenblkif_req *tapreq; + + assert(msg); + + tapreq = msg_to_tapreq(msg); + + assert(tapreq); + + err = tapdisk_xenblkif_queue_request(blkif, msg, tapreq); + if (err) { + /* TODO log error */ + nr_errors++; + tapdisk_xenblkif_complete_request(blkif, tapreq, err, 1); + } + } + + if (nr_errors) + xenio_blkif_put_response(blkif, NULL, 0, 1); +} + +void +tapdisk_xenblkif_reqs_free(struct td_xenblkif * const blkif) +{ + assert(blkif); + + free(blkif->reqs); + blkif->reqs = NULL; + + free(blkif->reqs_free); + blkif->reqs_free = NULL; +} + +int +tapdisk_xenblkif_reqs_init(struct td_xenblkif *td_blkif) +{ + int i = 0; + int err = 0; + + assert(td_blkif); + + td_blkif->ring_size = td_blkif_ring_size(td_blkif); + assert(td_blkif->ring_size > 0); + + td_blkif->reqs + malloc(td_blkif->ring_size * sizeof(struct td_xenblkif_req)); + if (!td_blkif->reqs) { + /* TODO log error */ + err = -errno; + goto fail; + } + + td_blkif->reqs_free + malloc(td_blkif->ring_size * sizeof(struct xenio_blkif_req *)); + if (!td_blkif->reqs_free) { + /* TODO log error */ + err = -errno; + goto fail; + } + + td_blkif->n_reqs_free = 0; + for (i = 0; i < td_blkif->ring_size; i++) + tapdisk_xenblkif_free_request(td_blkif, &td_blkif->reqs[i]); + + return 0; + +fail: + tapdisk_xenblkif_reqs_free(td_blkif); + return err; +} diff -r 8a9e0b69b44a -r 8e86ca826085 tools/blktap3/drivers/sring/td-req.h --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/blktap3/drivers/sring/td-req.h Tue Mar 12 12:56:01 2013 +0000 @@ -0,0 +1,131 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#ifndef __TD_REQ_H__ +#define __TD_REQ_H__ + +#include "tapdisk.h" +#include <sys/types.h> +#include <xen/io/blkif.h> +#include <sys/uio.h> +#include "td-blkif.h" + +/** + * Representation of the intermediate request used to retrieve a request from + * the shared ring and handle it over to the main tapdisk request processing + * routine. We could merge it into td_vbd_request_t or define it inside + * td_vbd_request_t, but keeping it separate simplifies keeping Xen stuff + * outside tapdisk. + * + * TODO rename to something better, e.g. ring_req? + */ +struct td_xenblkif_req { + /** + * A request descriptor in the ring. We need to copy the descriptors + * because the guest may modify it while we''re using it. Note that we only + * copy the descriptor and not the actual data, the guest is free to modify + * the data and corrupt itself if it wants to. + */ + blkif_request_t msg; + + /** + * tapdisk''s representation of the request. + */ + td_vbd_request_t vreq; + + /* + * FIXME xenio_blkif_get_request copies the request from the shared ring + * locally, into this.msg, so don''t need to keep a copy of id, op, and + * nr_segments. + */ + + /** + * Request id, must be echoed in response, according to the definition of + * blkif_request. + */ + uint64_t id; + + /** + * operation: read/write + * FIXME We maintain this here because we set it in the message when + * pushing the response. The question is whether we really need to set it + * in the first place. + * + * TODO Do we have to keep it here because blkif_request_t may be changed + * by the guest? + */ + uint8_t op; + + /** + * Number of segments. + * + * TODO Do we have to keep it here because blkif_request_t may be changed + * by the guest? + */ + int nr_segments; + + /** + * Pointer to memory-mapped grant refs. We keep this around because we need + * to pass it to xc_gnttab_munmap when the requests is completed. + */ + void *vma; + + /* + * FIXME Why 16+1? This member is copied to the corresponding one in + * td_vbd_request_t, so check the limit of that, if there is one. + */ + char name[16 + 1]; + + /** + * The scatter/gather list td_vbd_request_t.iov points to. + */ + struct td_iovec iov[BLKIF_MAX_SEGMENTS_PER_REQUEST]; +}; + +struct td_xenblkif; + +/** + * Queues the requests to the standard tapdisk queue. + * + * @param td_blkif the block interface corresponding to the VBD + * @param reqs array holding the request rescriptors + * @param nr_reqs number of requests in the array + */ +void +tapdisk_xenblkif_queue_requests(struct td_xenblkif * const blkif, + blkif_request_t *reqs[], const int nr_reqs); + +/** + * Initilises the intermediate requests of this block interface. + * + * @params td_blkif the block interface whose requests must be initialised + * @returns 0 on success + */ +int +tapdisk_xenblkif_reqs_init(struct td_xenblkif *td_blkif); + +/** + * Releases all the requests of the block interface. + * + * @param blkif the block interface whose requests should be freed + */ +void +tapdisk_xenblkif_reqs_free(struct td_xenblkif * const blkif); + +#endif /* __TD_REQ_H__ */
Thanos Makatos
2013-Mar-12 12:56 UTC
[PATCH 4 of 4 RFC] blktap3/sring: stats for the shared ring between tapdisk and the front-end
This patch introduces stats for the shared ring between tapdisk and the front-end. I suspect that the stats regarding the ring between tapdisk and blktap don''t make sense any more, so the stats introduced by this patch could be consolidated into the existing ones. Signed-off-by: Thanos Makatos <thanos.makatos@citrix.com> diff -r 8e86ca826085 -r 587c35050d21 tools/blktap3/drivers/sring/td-stats.c --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/blktap3/drivers/sring/td-stats.c Tue Mar 12 12:56:02 2013 +0000 @@ -0,0 +1,81 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#include "td-stats.h" +#include "td-blkif.h" +#include "td-ctx.h" + +static inline void +__tapdisk_xenblkif_stats(struct td_xenblkif * blkif, td_stats_t * st) +{ + tapdisk_stats_field(st, "pool", blkif->ctx->pool); + tapdisk_stats_field(st, "domid", "d", blkif->domid); + tapdisk_stats_field(st, "devid", "d", blkif->devid); + + tapdisk_stats_field(st, "reqs", "["); + tapdisk_stats_val(st, "llu", blkif->stats.reqs.in); + tapdisk_stats_val(st, "llu", blkif->stats.reqs.out); + tapdisk_stats_leave(st, '']''); + + tapdisk_stats_field(st, "kicks", "["); + tapdisk_stats_val(st, "llu", blkif->stats.kicks.in); + tapdisk_stats_val(st, "llu", blkif->stats.kicks.out); + tapdisk_stats_leave(st, '']''); + + tapdisk_stats_field(st, "errors", "{"); + tapdisk_stats_field(st, "msg", "llu", blkif->stats.errors.msg); + tapdisk_stats_field(st, "map", "llu", blkif->stats.errors.map); + tapdisk_stats_field(st, "vbq", "llu", blkif->stats.errors.vbd); + tapdisk_stats_field(st, "img", "llu", blkif->stats.errors.img); + tapdisk_stats_leave(st, '']''); +} + +void +tapdisk_xenblkif_stats(td_vbd_t * vbd, td_stats_t * st) +{ + struct td_xenblkif *blkif; + struct td_xenio_ctx *ctx; + int matches; + + tapdisk_xenio_for_each_ctx(ctx) { + tapdisk_xenio_for_each_blkif(blkif, ctx) { + if (blkif->vbd == vbd) { + matches = 1; + break; + } + } + } + if (!matches) + return; + + tapdisk_stats_field(st, "xen-blkifs", "["); + + tapdisk_xenio_for_each_ctx(ctx) { + tapdisk_xenio_for_each_blkif(blkif, ctx) { + if (blkif->vbd != vbd) + continue; + + tapdisk_stats_enter(st, ''{''); + __tapdisk_xenblkif_stats(blkif, st); + tapdisk_stats_leave(st, ''}''); + } + } + + tapdisk_stats_leave(st, '']''); +} diff -r 8e86ca826085 -r 587c35050d21 tools/blktap3/drivers/sring/td-stats.h --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tools/blktap3/drivers/sring/td-stats.h Tue Mar 12 12:56:02 2013 +0000 @@ -0,0 +1,39 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#ifndef __TD_STATS_H__ +#define __TD_STATS_H__ + +struct td_xenblkif_stats { + struct { + unsigned long long in; + unsigned long long out; + } reqs; + struct { + unsigned long long in; + unsigned long long out; + } kicks; + struct { + unsigned long long msg; + unsigned long long map; + unsigned long long vbd; + unsigned long long img; + } errors; +}; +#endif /* __TD_STATS_H__ */
Thanos Makatos
2013-Mar-14 17:24 UTC
Re: [PATCH 2 of 4 RFC] blktap3/sring: extract requests from the ring, grouping of VBDs
> Also, there is another piece of functionality introduced for which I > have some > questions: two or more VBDs served by the same tapdisk process can be > grouped into a "context". This context carries the handle to the event > channel driver used to bind to the guest domain''s port in order to > receive/send notifications for the shared ring. I don''t see any merit > by this grouping and I think it should be removed, is there something I > am overseeing?Actually, I think I just realized what''s the purpose of this grouping: reduce the number of event channels used for sring notifications, right? Correct me if I''m wrong but is there some ongoing work on increasing the number of event channels, rendering this optimisation useless?
Ian Campbell
2013-Mar-15 09:28 UTC
Re: [PATCH 2 of 4 RFC] blktap3/sring: extract requests from the ring, grouping of VBDs
On Thu, 2013-03-14 at 17:24 +0000, Thanos Makatos wrote:> > Also, there is another piece of functionality introduced for which I > > have some > > questions: two or more VBDs served by the same tapdisk process can be > > grouped into a "context". This context carries the handle to the event > > channel driver used to bind to the guest domain''s port in order to > > receive/send notifications for the shared ring. I don''t see any merit > > by this grouping and I think it should be removed, is there something I > > am overseeing? > > Actually, I think I just realized what''s the purpose of this grouping: > reduce the number of event channels used for sring notifications, > right? Correct me if I''m wrong but is there some ongoing work on > increasing the number of event channels, rendering this optimisation > useless?I would expect that you still need an event channel per vbd (since event channels are point to point), but perhaps this grouping allows you to save on the number of open file descriptors for /dev/xen/evtchn? (NB: I''ve not looked at the code at all). In general I would say that for both fds and evtchns regardless of the current limits it is better to try and minimise the number required so long as the overhead/complexity of doing so is not too big. Even if you don''t hit any actual hard limits I would expect there to be some soft limits relating to scalability to consider e.g. limitations of select()/poll() etc as well as the overheads inside the kernel of maintaining large numbers of open file descriptors etc. Ian.