Thanos Makatos
2013-Jul-15 11:41 UTC
[PATCH 0 of 6 RESEND v2] blktap3/sring: shared ring between tapdisk and the front-end
This patch series introduces the shared ring used by the front-end to pass request descriptors to tapdisk, as well as responses from tapdisk to the front-end. Requests from this ring end up in tapdisk''s standard request queue. When the tapback daemon detects that the front-end tries to connect to the back-end, it spawns a tapdisk and tells it to connect to the shared ring. The shared ring is created by the tapdisk using the grant references and the event channel port, supplied by tapback. Once the ring is created, tapdisk watches the event channel for notifications. When a notification is received, tapdisk extracts the request, parses it, and passes it to the standard tapdisk request queue for processing. Signed-off-by: Thanos Makatos <thanos.makatos@citrix.com>
Thanos Makatos
2013-Jul-15 11:41 UTC
[PATCH 1 of 6 RESEND v2] blktap3/sring: Headers required for compiling the shared ring
This patch series introduces the headers required for the shared ring library to compile. The files are: * io-optimize.h: imported from blktap2 * scheduler.h: imported from blktap2, contains definitions for the tapdisk scheduler, the component that processes events, be it I/O requests or control commands * tapdisk.h: imported from blktap2, core header file that contains the definition of the I/O request * tapdisk-image.h: imported from blktap2, contains the definition of the Virtual Disk Image (VDI). * tapdisk-log.h: imported from blktap2, contains logging stuff * tapdisk-queue.h: imported from blktap2, contains the tapdisk I/O request and event queue. * tapdisk-server.h: imported from blktap2 * tapdisk-stats.h: imported from blktap2.5, contains stats stuff * tapdisk-utils.h: imported from blktap2.5, contains assorted utility functions * tapdisk-vbd.h: imported from blktap2, contains the definition of the Virtual Block Device (VBD) Also, the patch series contains the following blktap3-related changes: * Update declaration of function tapdisk_image_open_chain as it now uses the parent /path/to/file instead of the parent minor number. * Update declaration of function tapdisk_server_get_vbd as it now uses type:/path/to/file instead of the minor number. Signed-off-by: Thanos Makatos <thanos.makatos@citrix.com> diff --git a/tools/blktap3/drivers/io-optimize.h b/tools/blktap3/drivers/io-optimize.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/io-optimize.h @@ -0,0 +1,68 @@ +/* + * Copyright (c) 2008, XenSource Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of XenSource Inc. nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER + * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ + +#ifndef __IO_OPTIMIZE_H__ +#define __IO_OPTIMIZE_H__ + +#include <libaio.h> + +struct opio; + +struct opio_list { + struct opio *head; + struct opio *tail; +}; + +struct opio { + char *buf; + unsigned long nbytes; + long long offset; + void *data; + struct iocb *iocb; + struct io_event event; + struct opio *head; + struct opio *next; + struct opio_list list; +}; + +struct opioctx { + int num_opios; + int free_opio_cnt; + struct opio *opios; + struct opio **free_opios; + struct iocb **iocb_queue; + struct io_event *event_queue; +}; + +int opio_init(struct opioctx *ctx, int num_iocbs); +void opio_free(struct opioctx *ctx); +int io_merge(struct opioctx *ctx, struct iocb **queue, int num); +int io_split(struct opioctx *ctx, struct io_event *events, int num); +int io_expand_iocbs(struct opioctx *ctx, struct iocb **queue, int idx, int num); + +#endif diff --git a/tools/blktap3/drivers/scheduler.h b/tools/blktap3/drivers/scheduler.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/scheduler.h @@ -0,0 +1,88 @@ +/* + * Copyright (c) 2008, XenSource Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of XenSource Inc. nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER + * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#ifndef _SCHEDULER_H_ +#define _SCHEDULER_H_ + +#include <sys/select.h> +#include "blktap3.h" + +#define SCHEDULER_POLL_READ_FD 0x1 +#define SCHEDULER_POLL_WRITE_FD 0x2 +#define SCHEDULER_POLL_EXCEPT_FD 0x4 +#define SCHEDULER_POLL_TIMEOUT 0x8 + +typedef int event_id_t; +typedef void (*event_cb_t) (event_id_t id, char mode, void *private); + +typedef struct event { + char mode; + char dead; + char pending; + char masked; + + event_id_t id; + + int fd; + int timeout; + int deadline; + + event_cb_t cb; + void *private; + + /* + * for linked lists + */ + TAILQ_ENTRY(event) entry; +} event_t; + +TAILQ_HEAD(tqh_event, event); + +typedef struct scheduler { + fd_set read_fds; + fd_set write_fds; + fd_set except_fds; + + struct tqh_event events; + + int uuid; + int max_fd; + int timeout; + int max_timeout; + int depth; +} scheduler_t; + +void scheduler_initialize(scheduler_t *); +event_id_t scheduler_register_event(scheduler_t *, char mode, + int fd, int timeout, + event_cb_t cb, void *private); +void scheduler_unregister_event(scheduler_t *, event_id_t); +void scheduler_mask_event(scheduler_t *, event_id_t, int masked); +void scheduler_set_max_timeout(scheduler_t *, int); +int scheduler_wait_for_events(scheduler_t *); + +#endif diff --git a/tools/blktap3/drivers/tapdisk-blktap.h b/tools/blktap3/drivers/tapdisk-blktap.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/tapdisk-blktap.h @@ -0,0 +1,84 @@ +/* + * Copyright (c) 2010, Citrix Systems, Inc. + * + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of XenSource Inc. nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER + * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _TAPDISK_BLKTAP_H_ +#define _TAPDISK_BLKTAP_H_ + +typedef struct td_blktap td_blktap_t; +typedef struct td_blktap_req td_blktap_req_t; + +#include "blktap3.h" +#include "tapdisk-vbd.h" + +#if 0 +struct td_blktap_stats { + struct { + unsigned long long in; + unsigned long long out; + } reqs; + struct { + unsigned long long in; + unsigned long long out; + } kicks; +}; +#endif + +struct td_blktap { + int minor; + //td_vbd_t *vbd; + +#if 0 + int fd; +#endif + +#if 0 + void *vma; + size_t vma_size; + + struct blktap_sring *sring; + unsigned int req_cons; + unsigned int rsp_prod_pvt; +#endif + +#if 0 + int event_id; + void *vstart; + + int n_reqs; + td_blktap_req_t *reqs; + int n_reqs_free; + td_blktap_req_t **reqs_free; +#endif + + //TAILQ_ENTRY(td_blktap) entry; + + //struct td_blktap_stats stats; +}; + +#endif /* _TAPDISK_BLKTAP_H_ */ diff --git a/tools/blktap3/drivers/tapdisk-image.h b/tools/blktap3/drivers/tapdisk-image.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/tapdisk-image.h @@ -0,0 +1,119 @@ +/* + * Copyright (c) 2008, XenSource Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of XenSource Inc. nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER + * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#ifndef _TAPDISK_IMAGE_H_ +#define _TAPDISK_IMAGE_H_ + +#include "tapdisk.h" + +TAILQ_HEAD(tqh_td_image_handle, td_image_handle); + +struct td_image_handle { + int type; + char *name; + + td_flag_t flags; + + td_driver_t *driver; + td_disk_info_t info; + + /* + * for linked lists + */ + TAILQ_ENTRY(td_image_handle) entry; + + /* + * Basic datapath statistics, in sectors read/written. + * + * hits: requests completed by this image. + * fail: requests completed with failure by this image. + * + * Not that we do not count e.g. + * miss: requests forwarded. + * total: requests processed by this image. + * + * This is because we''d have to compensate for restarts due to + * -EBUSY conditions. Those can be extrapolated by following + * the chain instead: sum(image[i].hits, i=0..) == vbd.secs; + */ + struct { + td_sector_count_t hits; + td_sector_count_t fail; + } stats; +}; + +#define tapdisk_for_each_image(_image, _head) \ + TAILQ_FOREACH(_image, _head, entry) + +#define tapdisk_for_each_image_safe(_image, _next, _head) \ + TAILQ_FOREACH_SAFE(_image, _head, entry, _next) + +#define tapdisk_for_each_image_reverse(_image, _head) \ + TAILQ_FOREACH_REVERSE(_image, _head, tqh_td_image_handle, entry) + +#define tapdisk_image_entry(_head) \ + list_entry(_head, td_image_t, next) + +/** + * Opens an image. + * + * @param type the image type (DISK_TYPE_*) + * @param name TODO ? + * @param flags TODO ? + * @param _image output parameter that receives a handle to the opened image + * @returns 0 on success + */ +int tapdisk_image_open(const int type, const char *name, const int flags, + td_image_t ** _image); + +void tapdisk_image_close(td_image_t *, struct tqh_td_image_handle *); + +/** + * Opens the image chain. + * + * @param params type:/path/to/file + * @param flags + * @param prt_path parent /path/to/file (optional) + * @param head + */ +int tapdisk_image_open_chain(const char *params, int flags, + const char* prt_path, struct tqh_td_image_handle *head); + +/** + * Closes all the images. + */ +void tapdisk_image_close_chain(struct tqh_td_image_handle *); +int tapdisk_image_validate_chain(struct tqh_td_image_handle *); + +td_image_t *tapdisk_image_allocate(const char *, int, td_flag_t); +void tapdisk_image_free(td_image_t *, struct tqh_td_image_handle *head); + +int tapdisk_image_check_td_request(td_image_t *, td_request_t); +int tapdisk_image_check_request(td_image_t *, struct td_vbd_request *); +void tapdisk_image_stats(td_image_t *, td_stats_t *); + +#endif diff --git a/tools/blktap3/drivers/tapdisk-log.h b/tools/blktap3/drivers/tapdisk-log.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/tapdisk-log.h @@ -0,0 +1,69 @@ +/* + * Copyright (c) 2009, XenSource Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of XenSource Inc. nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER + * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#ifndef _TAPDISK_LOG_H_ +#define _TAPDISK_LOG_H_ + +#define TLOG_WARN 0 +#define TLOG_INFO 1 +#define TLOG_DBG 2 + +#define TLOG_DIR "/var/log/blktap" + +#include <stdarg.h> +#include "blktap3.h" + +int tlog_open(const char *, int, int); +void tlog_close(void); +void tlog_precious(void); +void tlog_vsyslog(int, const char *, va_list); +void tlog_syslog(int, const char *, ...); + +#include <syslog.h> + +#define EPRINTF(_f, _a...) syslog(LOG_ERR, "%s:%d " _f, __FILE__, __LINE__, \ + ##_a) +#define DPRINTF(_f, _a...) syslog(LOG_INFO, "%s:%d "_f, __FILE__, __LINE__, \ + ##_a) +#define PERROR(_f, _a...) EPRINTF(_f ": %s", ##_a, strerror(errno)) + +void __tlog_write(int, const char *, ...) __printf(2, 3); +void __tlog_error(const char *fmt, ...) __printf(1, 2); + +#define tlog_write(_level, _f, _a...) \ + __tlog_write(_level, "%s: " _f, __func__, ##_a) + +#define tlog_error(_err, _f, _a...) \ + __tlog_error("ERROR: errno %d at %s: " _f, \ + (int)_err, __func__, ##_a) + +#define tlog_drv_error(_drv, _err, _f, _a ...) do { \ + if (tapdisk_driver_log_pass(_drv, __func__)) \ + tlog_error(_err, _f, ##_a); \ +} while (0) + +#endif /* __TAPDISK_LOG_H__ */ diff --git a/tools/blktap3/drivers/tapdisk-queue.h b/tools/blktap3/drivers/tapdisk-queue.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/tapdisk-queue.h @@ -0,0 +1,125 @@ +/* + * Copyright (c) 2008, XenSource Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of XenSource Inc. nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER + * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. +*/ + +#ifndef TAPDISK_QUEUE_H +#define TAPDISK_QUEUE_H + +#include <libaio.h> + +#include "io-optimize.h" +#include "scheduler.h" + +struct tiocb; +struct tfilter; + +typedef void (*td_queue_callback_t)(void *arg, struct tiocb *, int err); + + +struct tiocb { + td_queue_callback_t cb; + void *arg; + + struct iocb iocb; + struct tiocb *next; +}; + +struct tlist { + struct tiocb *head; + struct tiocb *tail; +}; + +struct tqueue { + int size; + + const struct tio *tio; + void *tio_data; + + struct opioctx opioctx; + + int queued; + struct iocb **iocbs; + + /* number of iocbs pending in the aio layer */ + int iocbs_pending; + + /* number of tiocbs pending in the queue -- + * this is likely to be larger than iocbs_pending + * due to request coalescing */ + int tiocbs_pending; + + /* iocbs may be deferred if the aio ring is full. + * tapdisk_queue_complete will ensure deferred + * iocbs are queued as slots become available. */ + struct tlist deferred; + int tiocbs_deferred; + + /* optional tapdisk filter */ + struct tfilter *filter; + + uint64_t deferrals; +}; + +struct tio { + const char *name; + size_t data_size; + + int (*tio_setup) (struct tqueue *queue, int qlen); + void (*tio_destroy) (struct tqueue *queue); + int (*tio_submit) (struct tqueue *queue); +}; + +enum { + TIO_DRV_LIO = 1, + TIO_DRV_RWIO = 2, +}; + +/* + * Interface for request producer (i.e., tapdisk) + * NB: the following functions may cause additional tiocbs to be queued: + * - tapdisk_submit_tiocbs + * - tapdisk_cancel_tiocbs + * - tapdisk_complete_tiocbs + * The *_all_tiocbs variants will handle the first two cases; + * be sure to call submit after calling complete in the third case. + */ +#define tapdisk_queue_count(q) ((q)->queued) +#define tapdisk_queue_empty(q) ((q)->queued == 0) +#define tapdisk_queue_full(q) \ + (((q)->tiocbs_pending + (q)->queued) >= (q)->size) +int tapdisk_init_queue(struct tqueue *, int size, int drv, struct tfilter *); +void tapdisk_free_queue(struct tqueue *); +void tapdisk_debug_queue(struct tqueue *); +void tapdisk_queue_tiocb(struct tqueue *, struct tiocb *); +int tapdisk_submit_tiocbs(struct tqueue *); +int tapdisk_submit_all_tiocbs(struct tqueue *); +int tapdisk_cancel_tiocbs(struct tqueue *); +int tapdisk_cancel_all_tiocbs(struct tqueue *); +void tapdisk_prep_tiocb(struct tiocb *, int, int, char *, size_t, + long long, td_queue_callback_t, void *); + +#endif diff --git a/tools/blktap3/drivers/tapdisk-server.h b/tools/blktap3/drivers/tapdisk-server.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/tapdisk-server.h @@ -0,0 +1,76 @@ +/* + * Copyright (c) 2008, XenSource Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of XenSource Inc. nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER + * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#ifndef _TAPDISK_SERVER_H_ +#define _TAPDISK_SERVER_H_ + +#include "tapdisk-vbd.h" +#include "tapdisk-queue.h" + +struct tap_disk *tapdisk_server_find_driver_interface(int); + +td_image_t *tapdisk_server_get_shared_image(td_image_t *); + +struct tqh_td_vbd_handle *tapdisk_server_get_all_vbds(void); + +/** + * Returns the VBD that corresponds to the specified type:/path/to/file. + * Returns NULL if such a VBD does not exist. + */ +td_vbd_t *tapdisk_server_get_vbd(const char *params); + +/** + * Adds the VBD to end of the list of VBDs. + */ +void tapdisk_server_add_vbd(td_vbd_t *); + +/** + * Removes the VBDs from the list of VBDs. + */ +void tapdisk_server_remove_vbd(td_vbd_t *); + +void tapdisk_server_queue_tiocb(struct tiocb *); + +void tapdisk_server_check_state(void); + +event_id_t tapdisk_server_register_event(char, int, int, event_cb_t, void *); +void tapdisk_server_unregister_event(event_id_t); +void tapdisk_server_mask_event(event_id_t, int); +void tapdisk_server_set_max_timeout(int); + +int tapdisk_server_init(void); +int tapdisk_server_initialize(const char *, const char *); +int tapdisk_server_complete(void); +int tapdisk_server_run(void); +void tapdisk_server_iterate(void); + +int tapdisk_server_openlog(const char *, int, int); +void tapdisk_server_closelog(void); +void tapdisk_start_logging(const char *, const char *); +void tapdisk_stop_logging(void); + +#endif diff --git a/tools/blktap3/drivers/tapdisk-stats.h b/tools/blktap3/drivers/tapdisk-stats.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/tapdisk-stats.h @@ -0,0 +1,70 @@ +/* + * Copyright (c) 2010, Citrix Systems, Inc. + * + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of XenSource Inc. nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER + * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _TAPDISK_STATS_H_ +#define _TAPDISK_STATS_H_ + +#include <string.h> + +#define TD_STATS_MAX_DEPTH 8 + +struct tapdisk_stats_ctx { + void *pos; + + void *buf; + size_t size; + + int n_elem[TD_STATS_MAX_DEPTH]; + int depth; +}; + +typedef struct tapdisk_stats_ctx td_stats_t; + +static inline void +tapdisk_stats_init(td_stats_t * st, char *buf, size_t size) +{ + memset(st, 0, sizeof(*st)); + + st->pos = buf; + st->buf = buf; + st->size = size; +} + +static inline size_t tapdisk_stats_length(td_stats_t * st) +{ + return st->pos - st->buf; +} + +void tapdisk_stats_enter(td_stats_t * st, char t); +void tapdisk_stats_leave(td_stats_t * st, char t); +void tapdisk_stats_field(td_stats_t * st, const char *key, + const char *conv, ...); +void tapdisk_stats_val(td_stats_t * st, const char *conv, ...); + +#endif /* _TAPDISK_STATS_H_ */ diff --git a/tools/blktap3/drivers/tapdisk-utils.h b/tools/blktap3/drivers/tapdisk-utils.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/tapdisk-utils.h @@ -0,0 +1,49 @@ +/* + * Copyright (c) 2008, XenSource Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of XenSource Inc. nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER + * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ +#ifndef _TAPDISK_UTILS_H_ +#define _TAPDISK_UTILS_H_ + +#include <inttypes.h> +#include <sys/time.h> +#include <stddef.h> + +#define MAX_NAME_LEN 1000 +#define TD_SYSLOG_IDENT_MAX 32 +#define TD_SYSLOG_STRTIME_LEN 15 + +int tapdisk_syslog_facility(const char *); +char *tapdisk_syslog_ident(const char *); +size_t tapdisk_syslog_strftime(char *, size_t, const struct timeval *); +size_t tapdisk_syslog_strftv(char *, size_t, const struct timeval *); +int tapdisk_set_resource_limits(void); +int tapdisk_namedup(char **, const char *); +int tapdisk_parse_disk_type(const char *, char **, int *); +int tapdisk_get_image_size(int, uint64_t *, uint32_t *); +int tapdisk_linux_version(void); + +#endif diff --git a/tools/blktap2/drivers/tapdisk-vbd.h b/tools/blktap3/drivers/tapdisk-vbd.h copy from tools/blktap2/drivers/tapdisk-vbd.h copy to tools/blktap3/drivers/tapdisk-vbd.h --- a/tools/blktap2/drivers/tapdisk-vbd.h +++ b/tools/blktap3/drivers/tapdisk-vbd.h @@ -1,4 +1,4 @@ -/* +/* * Copyright (c) 2008, XenSource Inc. * All rights reserved. * @@ -24,18 +24,18 @@ * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. -*/ + */ #ifndef _TAPDISK_VBD_H_ #define _TAPDISK_VBD_H_ #include <sys/time.h> -#include <xenctrl.h> -#include <xen/io/blkif.h> #include "tapdisk.h" #include "scheduler.h" #include "tapdisk-image.h" +#include "sring/td-blkif.h" +#define TD_VBD_REQUEST_TIMEOUT 120 #define TD_VBD_MAX_RETRIES 100 #define TD_VBD_RETRY_INTERVAL 1 @@ -47,75 +47,54 @@ #define TD_VBD_PAUSED 0x0020 #define TD_VBD_SHUTDOWN_REQUESTED 0x0040 #define TD_VBD_LOCKING 0x0080 -#define TD_VBD_RETRY_NEEDED 0x0100 -#define TD_VBD_LOG_DROPPED 0x0200 +#define TD_VBD_LOG_DROPPED 0x0100 -typedef struct td_ring td_ring_t; -typedef struct td_vbd_request td_vbd_request_t; -typedef struct td_vbd_driver_info td_vbd_driver_info_t; -typedef struct td_vbd_handle td_vbd_t; -typedef void (*td_vbd_cb_t) (void *, blkif_response_t *); +#define TD_VBD_SECONDARY_DISABLED 0 +#define TD_VBD_SECONDARY_MIRROR 1 +#define TD_VBD_SECONDARY_STANDBY 2 -struct td_ring { - int fd; - char *mem; - blkif_sring_t *sring; - blkif_back_ring_t fe_ring; - unsigned long vstart; -}; - -struct td_vbd_request { - blkif_request_t req; - int16_t status; - - int error; - int blocked; /* blocked on a dependency */ - int submitting; - int secs_pending; - int num_retries; - struct timeval last_try; - - td_vbd_t *vbd; - struct list_head next; -}; - -struct td_vbd_driver_info { - char *params; - int type; - struct list_head next; -}; +TAILQ_HEAD(tqh_td_vbd_handle, td_vbd_handle); struct td_vbd_handle { + /** + * type:/path/to/file + */ char *name; - td_uuid_t uuid; - int minor; + /** + * shared ring + */ + struct td_xenblkif *tap; - struct list_head driver_stack; - - int storage; - - uint8_t reopened; - uint8_t reactivated; td_flag_t flags; td_flag_t state; - struct list_head images; + struct tqh_td_image_handle images; - struct list_head new_requests; - struct list_head pending_requests; - struct list_head failed_requests; - struct list_head completed_requests; + int parent_devnum; + char *secondary_name; + td_image_t *secondary; + uint8_t secondary_mode; - td_vbd_request_t request_list[MAX_REQUESTS]; + /* TODO ??? */ + int FIXME_enospc_redirect_count_enabled; + uint64_t FIXME_enospc_redirect_count; - td_ring_t ring; - event_id_t ring_event_id; + /* when we encounter ENOSPC on the primary leaf image in mirror mode, + * we need to remove it from the VBD chain so that writes start going + * on the secondary leaf. However, we cannot free the image at that + * time since it might still have in-flight treqs referencing it. + * Therefore, we move it into ''retired'' until shutdown. */ + td_image_t *retired; - td_vbd_cb_t callback; - void *argument; + struct tqh_td_vbd_request new_requests; + struct tqh_td_vbd_request pending_requests; + struct tqh_td_vbd_request failed_requests; + struct tqh_td_vbd_request completed_requests; - struct list_head next; + td_vbd_request_t request_list[MAX_REQUESTS]; /* XXX */ + + TAILQ_ENTRY(td_vbd_handle) entry; struct timeval ts; @@ -125,83 +104,119 @@ struct td_vbd_handle { uint64_t secs_pending; uint64_t retries; uint64_t errors; + td_sector_count_t secs; }; #define tapdisk_vbd_for_each_request(vreq, tmp, list) \ - list_for_each_entry_safe((vreq), (tmp), (list), next) + TAILQ_FOREACH_SAFE((vreq), (list), next, (tmp)) #define tapdisk_vbd_for_each_image(vbd, image, tmp) \ - list_for_each_entry_safe((image), (tmp), &(vbd)->images, next) + tapdisk_for_each_image_safe(image, tmp, &vbd->images) +/** + * Removes the request from its current queue and inserts it to the specified + * one. + */ static inline void -tapdisk_vbd_move_request(td_vbd_request_t *vreq, struct list_head *dest) +tapdisk_vbd_move_request(td_vbd_request_t * vreq, + struct tqh_td_vbd_request *dest) { - list_del(&vreq->next); - INIT_LIST_HEAD(&vreq->next); - list_add_tail(&vreq->next, dest); + assert(vreq); + assert(dest); + + assert(vreq->list_head); + TAILQ_REMOVE(vreq->list_head, vreq, next); + + TAILQ_INSERT_TAIL(dest, vreq, next); + + vreq->list_head = dest; } static inline void tapdisk_vbd_add_image(td_vbd_t *vbd, td_image_t *image) { - list_add_tail(&image->next, &vbd->images); + TAILQ_INSERT_TAIL(&vbd->images, image, entry); } static inline int tapdisk_vbd_is_last_image(td_vbd_t *vbd, td_image_t *image) { - return list_is_last(&image->next, &vbd->images); + return TAILQ_LAST(&vbd->images, tqh_td_image_handle) == image; } -td_image_t * -tapdisk_vbd_first_image(td_vbd_t *vbd); +/** + * Retrieves the first image of this VBD. + */ +static inline td_image_t * +tapdisk_vbd_first_image(td_vbd_t *vbd) +{ + td_image_t *image = NULL; + if (!TAILQ_EMPTY(&vbd->images)) + image = TAILQ_FIRST(&vbd->images); + return image; +} static inline td_image_t * tapdisk_vbd_last_image(td_vbd_t *vbd) { - return list_entry(vbd->images.prev, td_image_t, next); + td_image_t *image = NULL; + if (!TAILQ_EMPTY(&vbd->images)) + image = TAILQ_LAST(&vbd->images, tqh_td_image_handle); + return image; } -static inline td_image_t * -tapdisk_vbd_next_image(td_image_t *image) +static inline td_image_t *tapdisk_vbd_next_image(td_image_t * image) { - return list_entry(image->next.next, td_image_t, next); + return TAILQ_NEXT(image, entry); } -td_vbd_t *tapdisk_vbd_create(td_uuid_t); -int tapdisk_vbd_initialize(td_uuid_t); -void tapdisk_vbd_set_callback(td_vbd_t *, td_vbd_cb_t, void *); -int tapdisk_vbd_parse_stack(td_vbd_t *vbd, const char *path); -int tapdisk_vbd_open(td_vbd_t *, const char *, uint16_t, - uint16_t, int, const char *, td_flag_t); +td_vbd_t *tapdisk_vbd_create(void); + +int tapdisk_vbd_initialize(int rfd, int wfd, const char * params); + +/* + * XXX Function definition commented out in blktap2.5. + */ +int tapdisk_vbd_open(td_vbd_t *, const char *, int, const char *, td_flag_t); + int tapdisk_vbd_close(td_vbd_t *); -void tapdisk_vbd_free(td_vbd_t *); -void tapdisk_vbd_free_stack(td_vbd_t *); -int tapdisk_vbd_open_stack(td_vbd_t *, uint16_t, td_flag_t); -int tapdisk_vbd_open_vdi(td_vbd_t *, const char *, - uint16_t, uint16_t, td_flag_t); +/** + * Opens a VDI. + * + * @params vbd output parameter that receives a handle to the opened VDI + * @param params type:/path/to/file + * @params flags TD_OPEN_* TODO which TD_OPEN_* flags are honored? How does + * each flag affect the behavior of this functions? Move TD_OPEN_* flag + * definitions close to this function (check if they''re used only by this + * function)? + * @param prt_path parent /path/to/file (optional) + * @returns 0 on success + */ +int tapdisk_vbd_open_vdi(td_vbd_t * vbd, const char *params, td_flag_t flags, + const char * prt_path); + +/** + * Closes a VDI. + */ void tapdisk_vbd_close_vdi(td_vbd_t *); -int tapdisk_vbd_attach(td_vbd_t *, const char *, int); -void tapdisk_vbd_detach(td_vbd_t *); - +int tapdisk_vbd_queue_request(td_vbd_t *, td_vbd_request_t *); void tapdisk_vbd_forward_request(td_request_t); -int tapdisk_vbd_get_image_info(td_vbd_t *, image_t *); -int tapdisk_vbd_queue_ready(td_vbd_t *); +int tapdisk_vbd_get_disk_info(td_vbd_t *, td_disk_info_t *); int tapdisk_vbd_retry_needed(td_vbd_t *); int tapdisk_vbd_quiesce_queue(td_vbd_t *); int tapdisk_vbd_start_queue(td_vbd_t *); int tapdisk_vbd_issue_requests(td_vbd_t *); int tapdisk_vbd_kill_queue(td_vbd_t *); int tapdisk_vbd_pause(td_vbd_t *); -int tapdisk_vbd_resume(td_vbd_t *, const char *, uint16_t); -int tapdisk_vbd_kick(td_vbd_t *); +int tapdisk_vbd_resume(td_vbd_t *, const char *); +void tapdisk_vbd_kick(td_vbd_t *); void tapdisk_vbd_check_state(td_vbd_t *); +int tapdisk_vbd_recheck_state(td_vbd_t *); void tapdisk_vbd_check_progress(td_vbd_t *); void tapdisk_vbd_debug(td_vbd_t *); - -void tapdisk_vbd_complete_vbd_request(td_vbd_t *, td_vbd_request_t *); +void tapdisk_vbd_stats(td_vbd_t *, td_stats_t *); #endif diff --git a/tools/blktap3/drivers/tapdisk.h b/tools/blktap3/drivers/tapdisk.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/tapdisk.h @@ -0,0 +1,251 @@ +/* + * Copyright (c) 2007, XenSource Inc. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions are met: + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in the + * documentation and/or other materials provided with the distribution. + * * Neither the name of XenSource Inc. nor the names of its contributors + * may be used to endorse or promote products derived from this software + * without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER + * OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, + * EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, + * PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR + * PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF + * LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING + * NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS + * SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + * + * Some notes on the tap_disk interface: + * + * tap_disk aims to provide a generic interface to easily implement new + * types of image accessors. The structure-of-function-calls is similar + * to disk interfaces used in qemu/denali/etc, with the significant + * difference being the expectation of asynchronous rather than synchronous + * I/O. The asynchronous interface is intended to allow lots of requests to + * be pipelined through a disk, without the disk requiring any of its own + * threads of control. As such, a batch of requests is delivered to the disk + * using: + * + * td_queue_[read,write]() + * + * and passing in a completion callback, which the disk is responsible for + * tracking. Disks should transform these requests as necessary and return + * the resulting iocbs to tapdisk using td_prep_[read,write]() and + * td_queue_tiocb(). + * + * NOTE: tapdisk uses the number of sectors submitted per request as a + * ref count. Plugins must use the callback function to communicate the + * completion -- or error -- of every sector submitted to them. + * + * td_get_parent_id returns: + * 0 if parent id successfully retrieved + * TD_NO_PARENT if no parent exists + * -errno on error + */ + +#ifndef _TAPDISK_H_ +#define _TAPDISK_H_ + +#include <sys/time.h> +#include <stdint.h> +#include <assert.h> + +// XXX? +//#include "blktaplib.h" +#include "blktap3.h" + +// TODO necessary? +#include "tapdisk-log.h" +#include "tapdisk-utils.h" +#include "tapdisk-stats.h" + +#define ARRAY_SIZE(a) (sizeof(a) / sizeof(a)[0]) + +#define MAX_SEGMENTS_PER_REQ 11 +#define MAX_REQUESTS 32U +#define SECTOR_SHIFT 9 +#define DEFAULT_SECTOR_SIZE 512 + +#define TAPDISK_DATA_REQUESTS (MAX_REQUESTS * MAX_SEGMENTS_PER_REQ) + +//#define BLK_NOT_ALLOCATED (-99) +#define TD_NO_PARENT 1 + +#define MAX_RAMDISK_SIZE 1024000 /*500MB disk limit */ + +#define TD_OP_READ 0 +#define TD_OP_WRITE 1 + +#define TD_OPEN_QUIET 0x00001 +#define TD_OPEN_QUERY 0x00002 +#define TD_OPEN_RDONLY 0x00004 +#define TD_OPEN_STRICT 0x00008 +#define TD_OPEN_SHAREABLE 0x00010 +#define TD_OPEN_ADD_CACHE 0x00020 +#define TD_OPEN_VHD_INDEX 0x00040 +#define TD_OPEN_LOG_DIRTY 0x00080 +#define TD_OPEN_LOCAL_CACHE 0x00100 +#define TD_OPEN_REUSE_PARENT 0x00200 +#define TD_OPEN_SECONDARY 0x00400 +#define TD_OPEN_STANDBY 0x00800 +#define TD_IGNORE_ENOSPC 0x01000 + +#define TD_CREATE_SPARSE 0x00001 +#define TD_CREATE_MULTITYPE 0x00002 + +#define td_flag_set(word, flag) ((word) |= (flag)) +#define td_flag_clear(word, flag) ((word) &= ~(flag)) +#define td_flag_test(word, flag) ((word) & (flag)) + +typedef uint16_t td_uuid_t; +typedef uint32_t td_flag_t; +typedef uint64_t td_sector_t; +typedef struct td_disk_id td_disk_id_t; +typedef struct td_disk_info td_disk_info_t; +typedef struct td_request td_request_t; +typedef struct td_driver_handle td_driver_t; +typedef struct td_image_handle td_image_t; +typedef struct td_sector_count td_sector_count_t; +typedef struct td_vbd_request td_vbd_request_t; +typedef struct td_vbd_handle td_vbd_t; + +/* + * Prototype of the callback to activate as requests complete. + */ +typedef void (*td_callback_t) (td_request_t, int); +typedef void (*td_vreq_callback_t) (td_vbd_request_t *, int, void *, int); + +struct td_disk_id { + char *name; + int type; + int flags; +}; + +struct td_disk_info { + td_sector_t size; + long sector_size; + uint32_t info; +}; + + +TAILQ_HEAD(tqh_td_vbd_request, td_vbd_request); + +struct td_vbd_request { + + /** + * Operation: read/write + */ + int op; + + /** + * Start sector. + */ + td_sector_t sec; + + /** + * Scatter/gather list. + */ + struct td_iovec *iov; + int iovcnt; + + /** + * Completion callback. + */ + td_vreq_callback_t cb; + + void *token; + const char *name; + + int error; + int prev_error; + + int submitting; + int secs_pending; + int num_retries; + struct timeval ts; + struct timeval last_try; + + /** + * VBD this request belongs to. + */ + td_vbd_t *vbd; + + /* + * linked list of struct td_vbd_request + */ + TAILQ_ENTRY(td_vbd_request) next; + + /* + * TODO list head of what? + */ + struct tqh_td_vbd_request *list_head; +}; + +/* TODO why have these two types (td_request and td_vbd_request) separate? */ +struct td_request { + int op; + void *buf; + + td_sector_t sec; + int secs; + + td_image_t *image; + + td_callback_t cb; + void *cb_data; + + int sidx; + td_vbd_request_t *vreq; +}; + +/* + * Structure describing the interface to a virtual disk implementation. + * See note at the top of this file describing this interface. + */ +struct tap_disk { + const char *disk_type; + td_flag_t flags; + int private_data_size; + int (*td_open) (td_driver_t *, const char *, td_flag_t); + int (*td_close) (td_driver_t *); + int (*td_get_parent_id) (td_driver_t *, td_disk_id_t *); + int (*td_validate_parent) (td_driver_t *, td_driver_t *, td_flag_t); + void (*td_queue_read) (td_driver_t *, td_request_t); + void (*td_queue_write) (td_driver_t *, td_request_t); + void (*td_debug) (td_driver_t *); + void (*td_stats) (td_driver_t *, td_stats_t *); +}; + +struct td_sector_count { + td_sector_t rd; + td_sector_t wr; +}; + +static inline void +td_sector_count_add(td_sector_count_t * s, td_sector_t v, int write) +{ + if (write) + s->wr += v; + else + s->rd += v; +} + +void td_panic(void); + +/* TODO why not use struct iovec? */ +struct td_iovec { + void *base; + unsigned int secs; +}; + +#endif /* __TAPDISK_H__ */
Thanos Makatos
2013-Jul-15 11:41 UTC
[PATCH 2 of 6 RESEND v2] blktap3/sring: connect to/disconnect from the shared ring
This patch introduces the functions that allows tapdisk to connect to/disconnect from the shared ring. They are message handlers executed when the tapback daemon sends a TAPDISK_MESSAGE_XENBLKIF_CONNECT/DISCONNECT message to the tapdisk, as a result of running the Xenbus protocol. The connection to the ring is effectively established by mapping the grant references and binding to the guest domain''s event channel port (both the grant references and the port are supplied by the tapback daemon). After the connection is established, a callback is registered to be executed when a notification for the ring arrives. Signed-off-by: Thanos Makatos <thanos.makatos@citrix.com> --- Changed since v1: * Make the VBD also point to the internal sring structure when the tapdisk connects to the sring and clear it when it disconnects. The VBD needs to point to the internal sring structure in order to retrieve stats. diff --git a/tools/blktap3/drivers/sring/blkif.h b/tools/blktap3/drivers/sring/blkif.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/sring/blkif.h @@ -0,0 +1,87 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#ifndef __SRING_BLKIF_H__ +#define __SRING_BLKIF_H__ + +#include <xen/io/blkif.h> + +/* Not a real protocol. Used to generate ring structs which contain + * the elements common to all protocols only. This way we get a + * compiler-checkable way to use common struct elements, so we can + * avoid using switch(protocol) in a number of places. */ +struct blkif_common_request { + char dummy; +}; +struct blkif_common_response { + char dummy; +}; + +/* i386 protocol version */ +#pragma pack(push, 4) +struct blkif_x86_32_request { + uint8_t operation; /* BLKIF_OP_??? */ + uint8_t nr_segments; /* number of segments */ + blkif_vdev_t handle; /* only for read/write requests */ + uint64_t id; /* private guest value, echoed in resp */ + blkif_sector_t sector_number; /* start sector idx on disk (r/w only) */ + struct blkif_request_segment seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; +}; +struct blkif_x86_32_response { + uint64_t id; /* copied from request */ + uint8_t operation; /* copied from request */ + int16_t status; /* BLKIF_RSP_??? */ +}; +typedef struct blkif_x86_32_request blkif_x86_32_request_t; +typedef struct blkif_x86_32_response blkif_x86_32_response_t; +#pragma pack(pop) + +/* x86_64 protocol version */ +struct blkif_x86_64_request { + uint8_t operation; /* BLKIF_OP_??? */ + uint8_t nr_segments; /* number of segments */ + blkif_vdev_t handle; /* only for read/write requests */ + uint64_t __attribute__ ((__aligned__(8))) id; + blkif_sector_t sector_number; /* start sector idx on disk (r/w only) */ + struct blkif_request_segment seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; +}; +struct blkif_x86_64_response { + uint64_t __attribute__ ((__aligned__(8))) id; + uint8_t operation; /* copied from request */ + int16_t status; /* BLKIF_RSP_??? */ +}; +typedef struct blkif_x86_64_request blkif_x86_64_request_t; +typedef struct blkif_x86_64_response blkif_x86_64_response_t; + +DEFINE_RING_TYPES(blkif_common, struct blkif_common_request, + struct blkif_common_response); +DEFINE_RING_TYPES(blkif_x86_32, struct blkif_x86_32_request, + struct blkif_x86_32_response); +DEFINE_RING_TYPES(blkif_x86_64, struct blkif_x86_64_request, + struct blkif_x86_64_response); + +union blkif_back_rings { + blkif_back_ring_t native; + blkif_common_back_ring_t common; + blkif_x86_32_back_ring_t x86_32; + blkif_x86_64_back_ring_t x86_64; +}; +typedef union blkif_back_rings blkif_back_rings_t; + +#endif /* __SRING_BLKIF_H__ */ diff --git a/tools/blktap3/drivers/sring/td-blkif.c b/tools/blktap3/drivers/sring/td-blkif.c new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/sring/td-blkif.c @@ -0,0 +1,231 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#include <assert.h> +#include <stdlib.h> +#include <errno.h> +#include <syslog.h> +#include <sys/mman.h> + +#include "blktap3.h" +#include "tapdisk.h" + +#include "td-blkif.h" +#include "td-ctx.h" +#include "td-req.h" + +struct td_xenblkif * +tapdisk_xenblkif_find(const domid_t domid, const int devid) +{ + struct td_xenblkif *blkif = NULL; + struct td_xenio_ctx *ctx; + + tapdisk_xenio_for_each_ctx(ctx) { + tapdisk_xenio_ctx_find_blkif(ctx, blkif, + blkif->domid == domid && + blkif->devid == devid); + if (blkif) + return blkif; + } + + return NULL; +} + +void +tapdisk_xenblkif_destroy(struct td_xenblkif * blkif) +{ + assert(blkif); + + tapdisk_xenblkif_reqs_free(blkif); + + if (blkif->ctx) { + if (blkif->port >= 0) + xc_evtchn_unbind(blkif->ctx->xce_handle, blkif->port); + + if (blkif->rings.common.sring) + xc_gnttab_munmap(blkif->ctx->xcg_handle, blkif->rings.common.sring, + blkif->ring_n_pages); + + TAILQ_REMOVE(&blkif->ctx->blkifs, blkif, entry); + tapdisk_xenio_ctx_put(blkif->ctx); + } + + free(blkif); +} + +int +tapdisk_xenblkif_disconnect(const domid_t domid, const int devid) +{ + struct td_xenblkif *blkif; + + blkif = tapdisk_xenblkif_find(domid, devid); + if (!blkif) + return ESRCH; + + if (blkif->n_reqs_free != blkif->ring_size) + return EBUSY; + + blkif->vbd->tap = NULL; + + tapdisk_xenblkif_destroy(blkif); + + return 0; +} + +int +tapdisk_xenblkif_connect(domid_t domid, int devid, const grant_ref_t * grefs, + int order, evtchn_port_t port, int proto, const char *pool, + td_vbd_t * vbd) +{ + struct td_xenblkif *td_blkif = NULL; + struct td_xenio_ctx *td_ctx; + int err; + unsigned int i; + void *sring; + size_t sz; + + assert(grefs); + assert(vbd); + + /* + * Already connected? + */ + if (tapdisk_xenblkif_find(domid, devid)) { + /* TODO log error */ + return EEXIST; + } + + err = tapdisk_xenio_ctx_get(pool, &td_ctx); + if (err) { + /* TODO log error */ + goto fail; + } + + td_blkif = calloc(1, sizeof(*td_blkif)); + if (!td_blkif) { + /* TODO log error */ + err = errno; + goto fail; + } + + td_blkif->domid = domid; + td_blkif->devid = devid; + td_blkif->vbd = vbd; + td_blkif->ctx = td_ctx; + td_blkif->proto = proto; + + /* + * Create the shared ring. + */ + td_blkif->ring_n_pages = 1 << order; + if (td_blkif->ring_n_pages > ARRAY_SIZE(td_blkif->ring_ref)) { + syslog(LOG_ERR, "too many pages (%u), max %lu\n", + td_blkif->ring_n_pages, ARRAY_SIZE(td_blkif->ring_ref)); + err = EINVAL; + goto fail; + } + + /* + * TODO Why don''t we just keep a copy of the array''s address? There should + * be a reason for copying the addresses of the pages, figure out why. + * TODO Why do we even store it in the td_blkif in the first place? + */ + for (i = 0; i < td_blkif->ring_n_pages; i++) + td_blkif->ring_ref[i] = grefs[i]; + + /* + * Map the grant references that will be holding the request descriptors. + */ + sring = xc_gnttab_map_domain_grant_refs(td_blkif->ctx->xcg_handle, + td_blkif->ring_n_pages, td_blkif->domid, td_blkif->ring_ref, + PROT_READ | PROT_WRITE); + if (!sring) { + err = errno; + syslog(LOG_ERR, "failed to map domain''s %d grant references: %s\n", + domid, strerror(err)); + goto fail; + } + + /* + * Size of the ring, in bytes. + */ + sz = XC_PAGE_SIZE << order; + + /* + * Initialize the mapped address into the shared ring. + * + * TODO Check for protocol support in the beginning of this function. + */ + switch (td_blkif->proto) { + case BLKIF_PROTOCOL_NATIVE: + { + blkif_sring_t *__sring = sring; + BACK_RING_INIT(&td_blkif->rings.native, __sring, sz); + break; + } + case BLKIF_PROTOCOL_X86_32: + { + blkif_x86_32_sring_t *__sring = sring; + BACK_RING_INIT(&td_blkif->rings.x86_32, __sring, sz); + break; + } + case BLKIF_PROTOCOL_X86_64: + { + blkif_x86_64_sring_t *__sring = sring; + BACK_RING_INIT(&td_blkif->rings.x86_64, __sring, sz); + break; + } + default: + syslog(LOG_ERR, "unsupported protocol 0x%x\n", td_blkif->proto); + err = EPROTONOSUPPORT; + goto fail; + } + + /* + * Bind to the remote port. + * TODO elaborate + */ + td_blkif->port = xc_evtchn_bind_interdomain(td_blkif->ctx->xce_handle, + td_blkif->domid, port); + if (td_blkif->port == -1) { + err = errno; + syslog(LOG_ERR, "failed to bind to event channel port %d of domain " + "%d: %s\n", port, td_blkif->domid, strerror(err)); + goto fail; + } + + err = tapdisk_xenblkif_reqs_init(td_blkif); + if (err) { + /* TODO log error */ + goto fail; + } + + vbd->tap = td_blkif; + + TAILQ_INSERT_TAIL(&td_ctx->blkifs, td_blkif, entry); + + return 0; + +fail: + if (td_blkif) + tapdisk_xenblkif_destroy(td_blkif); + + return err; +} + diff --git a/tools/blktap3/drivers/sring/td-blkif.h b/tools/blktap3/drivers/sring/td-blkif.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/sring/td-blkif.h @@ -0,0 +1,168 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#ifndef __TD_BLKIF_H__ +#define __TD_BLKIF_H__ + +#include <inttypes.h> /* TODO required by xen/event_channel.h */ +#include <xen/event_channel.h> +#include <xenctrl.h> + +#include "blkif.h" +#include "td-req.h" +#include "td-stats.h" +#include "../tapdisk-vbd.h" + +struct td_xenio_ctx; +struct td_vbd_handle; +struct td_xenblkif_stats; + +struct td_xenblkif { + + /** + * The domain ID where the front-end is running. + */ + int domid; + + /** + * The device ID of the VBD. + */ + int devid; + + + /** + * Pointer to the context this block interface belongs to. + */ + struct td_xenio_ctx *ctx; + + /** + * for linked lists. + */ + TAILQ_ENTRY(td_xenblkif) entry; + + /** + * The local port corresponding to the remote port of the domain where the + * front-end is running. We use this to tell for which VBD a pending event + * is, and for notifying the front-end for responses we have produced and + * placed in the shared ring. + */ + evtchn_port_or_error_t port; + + /** + * protocol (native, x86, or x64) + * Need to keep around? Replace with function pointer? + */ + int proto; + + blkif_back_rings_t rings; + + /** + * TODO Why 8 specifically? + * TODO Do we really need to keep it around? + */ + grant_ref_t ring_ref[8]; + + /** + * Number of pages in the ring that holds the request descriptors. + */ + unsigned int ring_n_pages; + + /* + * Size of the ring, expressed in requests. + * TODO Do we really need to keep this around? + */ + int ring_size; + + /** + * Intermediate requests. The array is managed as a stack, with n_reqs_free + * pointing to the top of the stack, at the next available intermediate + * request. + */ + struct td_xenblkif_req *reqs; + + /** + * Stack pointer to the aforementioned stack. + */ + int n_reqs_free; + + blkif_request_t **reqs_free; + + /** + * Pointer to the actual VBD. + */ + struct td_vbd_handle *vbd; + + /** + * stats + */ + struct td_xenblkif_stats stats; +}; + +#define tapdisk_xenio_for_each_ctx(_ctx) \ + TAILQ_FOREACH(_ctx, &_td_xenio_ctxs, entry) + +/** + * Connects the tapdisk to the shared ring. + * + * @param domid the ID of the guest domain + * @param devid the device ID + * @param grefs the grant references + * @param order number of grant references + * @param port event channel port of the guest domain to use for ring + * notifications + * @param proto protocol (native, x86, or x64) + * @param pool name of the context + * @param vbd the VBD + * @returns 0 on success + */ +int +tapdisk_xenblkif_connect(domid_t domid, int devid, const grant_ref_t * grefs, + int order, evtchn_port_t port, int proto, const char *pool, + td_vbd_t * vbd); + +/** + * Disconnects the tapdisk from the shared ring. + * + * @param domid the domain ID of the guest domain + * @param devid the device ID of the VBD + * @returns 0 on success + */ +int +tapdisk_xenblkif_disconnect(const domid_t domid, const int devid); + +/** + * Destroys a XEN block interface. + * + * @param blkif the block interface to destroy + */ +void +tapdisk_xenblkif_destroy(struct td_xenblkif * blkif); + +/** + * Searches all block interfaces in all contexts for a block interface + * having the specified domain and device ID. + * + * @param domid the domain ID + * @param devid the device ID + * @returns a pointer to the block interface if found, else NULL + */ +struct td_xenblkif * +tapdisk_xenblkif_find(const domid_t domid, const int devid); + +#endif /* __TD_BLKIF_H__ */
Thanos Makatos
2013-Jul-15 11:41 UTC
[PATCH 3 of 6 RESEND v2] blktap3/sring: extract requests from the ring, grouping of VBDs
This patch introduces the functions that extract requests from the shared ring. When a VBD is created, the file descriptor of the event channel is added to the set of the file descriptors watched by the main select(). When a notification arrives, the callback associated with it extracts requests from the ring. Parsing of the requests and passing them to the tapdisk standard request queue is introduced by another patch in this series. Signed-off-by: Thanos Makatos <thanos.makatos@citrix.com> --- Changed since v1: * Clarified use of grouping of event channels. diff --git a/tools/blktap3/drivers/sring/td-ctx.c b/tools/blktap3/drivers/sring/td-ctx.c new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/sring/td-ctx.c @@ -0,0 +1,443 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#include <assert.h> +#include <errno.h> +#include <syslog.h> + +#include "tapdisk-server.h" + +#include "td-ctx.h" + +struct tqh_td_xenio_ctx _td_xenio_ctxs + = TAILQ_HEAD_INITIALIZER(_td_xenio_ctxs); + +/** + * TODO releases a pool? + */ +static void +tapdisk_xenio_ctx_close(struct td_xenio_ctx * const ctx) +{ + assert(ctx); + + if (ctx->ring_event >= 0) { + tapdisk_server_unregister_event(ctx->ring_event); + ctx->ring_event = -1; + } + + if (ctx->xce_handle) { + xc_evtchn_close(ctx->xce_handle); + ctx->xce_handle = NULL; + } + + if (ctx->xcg_handle) { + xc_evtchn_close(ctx->xcg_handle); + ctx->xcg_handle = NULL; + } + + TAILQ_REMOVE(&_td_xenio_ctxs, ctx, entry); + + /* TODO when do we free it? */ +} + +/* + * XXX only called by tapdisk_xenio_ctx_ring_event + */ +static inline struct td_xenblkif * +xenio_pending_blkif(struct td_xenio_ctx * const ctx) +{ + evtchn_port_or_error_t port; + struct td_xenblkif *blkif; + int err; + + assert(ctx); + + /* + * Get the local port for which there is a pending event. + */ + port = xc_evtchn_pending(ctx->xce_handle); + if (port == -1) { + /* TODO log error */ + return NULL; + } + + /* + * Find the block interface with that local port. + */ + tapdisk_xenio_ctx_find_blkif(ctx, blkif, + blkif->port == port); + if (blkif) { + err = xc_evtchn_unmask(ctx->xce_handle, port); + if (err) { + /* TODO log error */ + return NULL; + } + } + /* + * TODO Is it possible to have an pending event channel but no block + * interface associated with it? + */ + + return blkif; +} + +#define blkif_get_req(dst, src) \ +{ \ + int i, n = BLKIF_MAX_SEGMENTS_PER_REQUEST; \ + dst->operation = src->operation; \ + dst->nr_segments = src->nr_segments; \ + dst->handle = src->handle; \ + dst->id = src->id; \ + dst->sector_number = src->sector_number; \ + xen_rmb(); \ + if (n > dst->nr_segments) \ + n = dst->nr_segments; \ + for (i = 0; i < n; i++) \ + dst->seg[i] = src->seg[i]; \ +} + +/** + * Utility function that retrieves a request using @idx as the ring index, + * copying it to the @dst in a H/W independent way. + * + * @param blkif the block interface + * @param dst address that receives the request + * @param rc the index of the request in the ring + */ +static inline void +xenio_blkif_get_request(struct td_xenblkif * const blkif, + blkif_request_t *const dst, const RING_IDX idx) +{ + blkif_back_rings_t * rings; + + assert(blkif); + assert(dst); + + rings = &blkif->rings; + + switch (blkif->proto) { + case BLKIF_PROTOCOL_NATIVE: + { + blkif_request_t *src; + src = RING_GET_REQUEST(&rings->native, idx); + memcpy(dst, src, sizeof(blkif_request_t)); + break; + } + + case BLKIF_PROTOCOL_X86_32: + { + blkif_x86_32_request_t *src; + src = RING_GET_REQUEST(&rings->x86_32, idx); + blkif_get_req(dst, src); + break; + } + + case BLKIF_PROTOCOL_X86_64: + { + blkif_x86_64_request_t *src; + src = RING_GET_REQUEST(&rings->x86_64, idx); + blkif_get_req(dst, src); + break; + } + + default: + /* + * TODO log error + */ + assert(0); + } +} + +/** + * Retrieves at most @count request descriptors from the ring, copying them to + * @reqs. + * + * @param blkif the block interface + * @param reqs array of pointers where each element points to sufficient memory + * space that receives each request descriptor + * @param count retrieve at most that many request descriptors + * @returns the number of retrieved request descriptors + * + * XXX only called by xenio_blkif_get_requests + */ +static inline int +__xenio_blkif_get_requests(struct td_xenblkif * const blkif, + blkif_request_t *reqs[], const unsigned int count) +{ + blkif_common_back_ring_t * ring; + RING_IDX rp, rc; + unsigned int n; + + assert(blkif); + assert(reqs); + + if (!count) + return 0; + + ring = &blkif->rings.common; + + rp = ring->sring->req_prod; + xen_rmb(); /* TODO why? */ + + for (rc = ring->req_cons, n = 0; rc != rp; rc++) { + blkif_request_t *dst = reqs[n]; + + if (n++ >= count) + break; + + xenio_blkif_get_request(blkif, dst, rc); + } + + ring->req_cons = rc; + + return n; +} + +/** + * Retrieves at most @count request descriptors. + * + * @param blkif the block interface + * @param reqs array of pointers where each pointer points to sufficient + * memory to hold a request descriptor + * @count maximum number of request descriptors to retrieve + * @param final re-enable notifications before it stops reading + * @returns the number of request descriptors retrieved + * + * TODO change name + */ +static inline int +xenio_blkif_get_requests(struct td_xenblkif * const blkif, + blkif_request_t *reqs[], const int count, const int final) +{ + blkif_common_back_ring_t * ring; + int n = 0; + int work = 0; + + assert(blkif); + assert(reqs); + assert(count > 0); + + ring = &blkif->rings.common; + + do { + if (final) + RING_FINAL_CHECK_FOR_REQUESTS(ring, work); + else + work = RING_HAS_UNCONSUMED_REQUESTS(ring); + + if (!work) + break; + + if (n >= count) + break; + + n += __xenio_blkif_get_requests(blkif, reqs + n, count - n); + } while (1); + + return n; +} + +/** + * Callback executed when there is a request descriptor in the ring. Copies as + * many request descriptors as possible (limited by local buffer space) to the + * td_blkif''s local request buffer and queues them to the tapdisk queue. + */ +static inline void +tapdisk_xenio_ctx_ring_event(event_id_t id __attribute__((unused)), + char mode __attribute__((unused)), void *private) +{ + struct td_xenio_ctx *ctx = private; + struct td_xenblkif *blkif = NULL; + int n_reqs; + int final = 0; + int start; + blkif_request_t **reqs; + + assert(ctx); + + blkif = xenio_pending_blkif(ctx); + if (!blkif) { + /* TODO log error */ + return; + } + + start = blkif->n_reqs_free; + blkif->stats.kicks.in++; + + /* + * In each iteration, copy as many request descriptors from the shared ring + * that can fit in the td_blkif''s buffer. + */ + do { + reqs = &blkif->reqs_free[blkif->ring_size - blkif->n_reqs_free]; + + assert(reqs); + + n_reqs = xenio_blkif_get_requests(blkif, reqs, blkif->n_reqs_free, + final); + assert(n_reqs >= 0); + if (!n_reqs) + break; + + blkif->n_reqs_free -= n_reqs; + final = 1; + + } while (1); + + n_reqs = start - blkif->n_reqs_free; + if (!n_reqs) + /* TODO If there are no requests to be copied, why was there a + * notification in the first place? + */ + return; + blkif->stats.reqs.in += n_reqs; + reqs = &blkif->reqs_free[blkif->ring_size - start]; + tapdisk_xenblkif_queue_requests(blkif, reqs, n_reqs); +} + +/* NB. may be NULL, but then the image must be bouncing I/O */ +#define TD_XENBLKIF_DEFAULT_POOL "td-xenio-default" + +/** + * Opens a context on the specified pool. + * + * @param pool the pool, it can either be NULL or a non-zero length string + * @returns 0 in success + * + * TODO The pool is ignored, we always open the default pool. + */ +static inline int +tapdisk_xenio_ctx_open(const char *pool) +{ + struct td_xenio_ctx *ctx; + int fd, err; + + /* zero-length pool names are not allowed */ + if (pool && !strlen(pool)) + return EINVAL; + + ctx = calloc(1, sizeof(*ctx)); + if (!ctx) { + err = errno; + syslog(LOG_ERR, "cannot allocate memory"); + goto fail; + } + + ctx->ring_event = -1; /* TODO is there a special value? */ + ctx->pool = TD_XENBLKIF_DEFAULT_POOL; + TAILQ_INIT(&ctx->blkifs); + TAILQ_INSERT_HEAD(&_td_xenio_ctxs, ctx, entry); + + ctx->xce_handle = xc_evtchn_open(NULL, 0); + if (ctx->xce_handle == NULL) { + err = errno; + syslog(LOG_ERR, "failed to open the event channel driver: %s\n", + strerror(err)); + goto fail; + } + + ctx->xcg_handle = xc_gnttab_open(NULL, 0); + if (ctx->xcg_handle == NULL) { + err = errno; + syslog(LOG_ERR, "failed to open the grant table driver: %s\n", + strerror(err)); + goto fail; + } + + fd = xc_evtchn_fd(ctx->xce_handle); + if (fd < 0) { + err = errno; + syslog(LOG_ERR, "failed to get the event channel file descriptor: %s\n", + strerror(err)); + goto fail; + } + + ctx->ring_event = tapdisk_server_register_event(SCHEDULER_POLL_READ_FD, + fd, 0, tapdisk_xenio_ctx_ring_event, ctx); + if (ctx->ring_event < 0) { + err = -ctx->ring_event; + syslog(LOG_ERR, "failed to register event: %s\n", strerror(err)); + goto fail; + } + + return 0; + +fail: + tapdisk_xenio_ctx_close(ctx); + return err; +} + + +/** + * Tells whether @ctx belongs to @pool. + * + * If no @pool is not specified and a default pool is set, @ctx is compared + * against the default pool. Note that NULL is valid pool name value. + */ +static inline int +__td_xenio_ctx_match(struct td_xenio_ctx * ctx, const char *pool) +{ + if (unlikely(!pool)) { + if (NULL != TD_XENBLKIF_DEFAULT_POOL) + return !strcmp(ctx->pool, TD_XENBLKIF_DEFAULT_POOL); + else + return !ctx->pool; + } + + return !strcmp(ctx->pool, pool); +} + +#define tapdisk_xenio_find_ctx(_ctx, _cond) \ + do { \ + int found = 0; \ + tapdisk_xenio_for_each_ctx(_ctx) { \ + if (_cond) { \ + found = 1; \ + break; \ + } \ + } \ + if (!found) \ + _ctx = NULL; \ + } while (0) + +int +tapdisk_xenio_ctx_get(const char *pool, struct td_xenio_ctx ** _ctx) +{ + struct td_xenio_ctx *ctx; + int err = 0; + + do { + tapdisk_xenio_find_ctx(ctx, __td_xenio_ctx_match(ctx, pool)); + if (ctx) { + *_ctx = ctx; + return 0; + } + + err = tapdisk_xenio_ctx_open(pool); + } while (!err); + + return err; +} + +void +tapdisk_xenio_ctx_put(struct td_xenio_ctx * ctx) +{ + if (TAILQ_EMPTY(&ctx->blkifs)) + tapdisk_xenio_ctx_close(ctx); +} diff --git a/tools/blktap3/drivers/sring/td-ctx.h b/tools/blktap3/drivers/sring/td-ctx.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/sring/td-ctx.h @@ -0,0 +1,113 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#ifndef __TD_CTX_H__ +#define __TD_CTX_H__ + +#include "td-blkif.h" +#include <xenctrl.h> +#include "scheduler.h" + +TAILQ_HEAD(tqh_td_xenblkif, td_xenblkif); + +/** + * A VBD context: groups two or more VBDs of the same tapdisk. + * + * TODO The purpose of this struct is dubious: it allows one or more VBDs + * belonging to the same tapdisk process to share the same handle to the event + * channel driver. This means that when an event triggers on some event + * channel, this notification will be delivered via the file descriptor of the + * handle. Thus, we need to look which exactly event channel triggered. This + * functionality is a trade off between reducing the total amount of open + * handles to the event channel driver versus speeding up the data path. Also, + * it effectively allows for more event channels to be polled by select. The + * bottom line is that if we use one VBD per tapdisk this functionality is + * unnecessary. + */ +struct td_xenio_ctx { + char *pool; /* TODO rename to pool_name */ + + /** + * Handle to the grant table driver. + */ + xc_gnttab *xcg_handle; + + /** + * Handle to the event channel driver. + */ + xc_evtchn *xce_handle; + + /** + * Return value of tapdisk_server_register_event, we use this to tell + * whether the context is registered. + */ + event_id_t ring_event; + + /** + * block interfaces in this pool + */ + struct tqh_td_xenblkif blkifs; + + /** + * for linked lists + */ + TAILQ_ENTRY(td_xenio_ctx) entry; +}; + +/** + * Retrieves the context corresponding to the specified pool name, creating it + * if it doesn''t already exist. + */ +int +tapdisk_xenio_ctx_get(const char *pool, struct td_xenio_ctx ** _ctx); + +/** + * Releases the pool, only if there is no block interface using it. + */ +void +tapdisk_xenio_ctx_put(struct td_xenio_ctx * ctx); + +/** + * List of contexts. + */ +extern TAILQ_HEAD(tqh_td_xenio_ctx, td_xenio_ctx) _td_xenio_ctxs; + +/** + * For each block interface of this context... + */ +#define tapdisk_xenio_for_each_blkif(_blkif, _ctx) \ + TAILQ_FOREACH(_blkif, &(_ctx)->blkifs, entry) + +/** + * Search this context for the block interface for which the condition is true. + */ +#define tapdisk_xenio_ctx_find_blkif(_ctx, _blkif, _cond) \ + do { \ + int found = 0; \ + tapdisk_xenio_for_each_blkif(_blkif, _ctx) { \ + if (_cond) { \ + found = 1; \ + break; \ + } \ + } \ + if (!found) \ + _blkif = NULL; \ + } while (0) + +#endif /* __TD_CTX_H__ */
Thanos Makatos
2013-Jul-15 11:41 UTC
[PATCH 4 of 6 RESEND v2] blktap3/sring: parse ring requests and handle them over to tapdisk
This patch introduces the functionality of parsing shared ring requests and adding them to tapdisk''s standard request queue. Also, it provides the functionality of producing responses. Signed-off-by: Thanos Makatos <thanos.makatos@citrix.com> diff --git a/tools/blktap3/drivers/sring/td-req.c b/tools/blktap3/drivers/sring/td-req.c new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/sring/td-req.c @@ -0,0 +1,505 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#include "td-req.h" +#include "td-blkif.h" +#include <assert.h> +#include <stdlib.h> +#include <errno.h> +#include "td-ctx.h" +#include <syslog.h> +#include <inttypes.h> +#include "tapdisk-vbd.h" + +/* + * TODO needed for PROT_READ/PROT_WRITE, probably some xen header supplies them + * too + */ +#include <sys/mman.h> + +/** + * Puts the request back to the free list of this block interface. + * + * @param blkif the block interface + * @param tapreq the request to give back + */ +static void +tapdisk_xenblkif_free_request(struct td_xenblkif * const blkif, + struct td_xenblkif_req * const tapreq) +{ + assert(blkif); + assert(tapreq); + assert(blkif->n_reqs_free <= blkif->ring_size); + + blkif->reqs_free[blkif->ring_size - (++blkif->n_reqs_free)] = &tapreq->msg; +} + +/** + * Returns the size, in request descriptors, of the shared ring + * + * @param blkif the block interface + * @returns the size, in request descriptors, of the shared ring + */ +static int +td_blkif_ring_size(const struct td_xenblkif * const blkif) +{ + assert(blkif); + + switch (blkif->proto) { + case BLKIF_PROTOCOL_NATIVE: + return RING_SIZE(&blkif->rings.native); + + case BLKIF_PROTOCOL_X86_32: + return RING_SIZE(&blkif->rings.x86_32); + + case BLKIF_PROTOCOL_X86_64: + return RING_SIZE(&blkif->rings.x86_64); + + default: + return -EPROTONOSUPPORT; + } +} + +/** + * Unmaps a request''s data. + * + * @param blkif the block interface the request belongs to + * @param req the request to unmap + */ +static int +xenio_blkif_munmap_one(struct td_xenblkif * const blkif, + struct td_xenblkif_req * const req) +{ + struct td_xenio_ctx *ctx; + int err; + + assert(blkif); + assert(req); + + ctx = blkif->ctx; + assert(ctx); + + err = xc_gnttab_munmap(ctx->xcg_handle, req->vma, req->nr_segments); + if (err) { + err = errno; + /* TODO don''t use syslog for error on the data path */ + syslog(LOG_ERR, "failed to unmap pages: %s\n", strerror(err)); + return err; + } + + req->vma = NULL; + return 0; +} + +/** + * Get the response that corresponds to the specified ring index in a H/W + * independent way. + * + * TODO use function pointers instead of switch + * XXX only called by xenio_blkif_put_response + */ +static inline +blkif_response_t *xenio_blkif_get_response(struct td_xenblkif* const blkif, + const RING_IDX rp) +{ + blkif_back_rings_t * const rings = &blkif->rings; + blkif_response_t * p = NULL; + + switch (blkif->proto) { + case BLKIF_PROTOCOL_NATIVE: + p = (blkif_response_t *) RING_GET_RESPONSE(&rings->native, rp); + break; + case BLKIF_PROTOCOL_X86_32: + p = (blkif_response_t *) RING_GET_RESPONSE(&rings->x86_32, rp); + break; + case BLKIF_PROTOCOL_X86_64: + p = (blkif_response_t *) RING_GET_RESPONSE(&rings->x86_64, rp); + break; + default: + /* TODO gracefully fail? */ + abort(); + } + + return p; +} + +/** + * Puts a response in the ring. + * + * @param blkif the VBD + * @param req the request for which the response should be put + * @param status the status of the response (success or an error code) + * @param final controls whether the front-end will be notified, if necessary + * + * TODO @req can be NULL so the function will only notify the other end. This + * is used in the error path of tapdisk_xenblkif_queue_requests. The point is + * that the other will just be notified, does this make sense? + */ +static int +xenio_blkif_put_response(struct td_xenblkif * const blkif, + struct td_xenblkif_req *req, int const status, int const final) +{ + blkif_common_back_ring_t * const ring = &blkif->rings.common; + + if (req) { + blkif_response_t * msg = xenio_blkif_get_response(blkif, + ring->rsp_prod_pvt); + assert(msg); + + assert(status == BLKIF_RSP_EOPNOTSUPP || status == BLKIF_RSP_ERROR + || status == BLKIF_RSP_OKAY); + + msg->id = req->id; + + /* TODO Why do we have to set this? */ + msg->operation = req->op; + + msg->status = status; + + ring->rsp_prod_pvt++; + } + + if (final) { + int notify; + RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(ring, notify); + if (notify) { + int err = xc_evtchn_notify(blkif->ctx->xce_handle, blkif->port); + if (err < 0) { + /* TODO log error */ + return errno; + } + } + } + + return 0; +} + +/** + * Completes a request. + * + * @blkif the VBD the request belongs belongs to + * @tapreq the request to complete + * @error completion status of the request + * @final controls whether the other end should be notified + */ +static void +tapdisk_xenblkif_complete_request(struct td_xenblkif * const blkif, + struct td_xenblkif_req* tapreq, const int error, const int final) +{ + assert(blkif); + assert(tapreq); + + if (tapreq->vma) { + int err = 0; + err = xenio_blkif_munmap_one(blkif, tapreq); + /* TODO How do we deal with errors here? */ + } + + xenio_blkif_put_response(blkif, tapreq, + (error ? BLKIF_RSP_ERROR : BLKIF_RSP_OKAY), final); + + tapdisk_xenblkif_free_request(blkif, tapreq); + + blkif->stats.reqs.out++; + if (final) + blkif->stats.kicks.out++; +} + +/** + * Request completion callback, executed when the tapdisk has finished + * processing the request. + * + * @param vreq the completed request + * @param error status of the request + * @param token token previously associated with this request + * @param final TODO ? + */ +static inline void +__tapdisk_xenblkif_request_cb(struct td_vbd_request * const vreq, + const int error, void * const token, const int final) +{ + struct td_xenblkif_req *tapreq; + struct td_xenblkif * const blkif = token; + + assert(vreq); + assert(blkif); + + tapreq = containerof(vreq, struct td_xenblkif_req, vreq); + + tapdisk_xenblkif_complete_request(blkif, tapreq, error, final); + if (error) + blkif->stats.errors.img++; +} + +/** + * Initialises the standard tapdisk request (td_vbd_request_t) from the + * intermediate ring request (td_xenblkif_req) in order to prepare it + * processing. + * + * @param blkif the block interface + * @param tapreq the request to prepare + * @returns 0 on success + * + * TODO only called by tapdisk_xenblkif_queue_request + */ +static inline int +tapdisk_xenblkif_make_vbd_request(struct td_xenblkif * const blkif, + struct td_xenblkif_req * const tapreq) +{ + td_vbd_request_t *vreq; + int i; + struct td_iovec *iov; + void *page, *next, *last; + int prot; + grant_ref_t gref[BLKIF_MAX_SEGMENTS_PER_REQUEST]; + + assert(tapreq); + + vreq = &tapreq->vreq; + assert(vreq); + + switch (tapreq->msg.operation) { + case BLKIF_OP_READ: + tapreq->op = BLKIF_OP_READ; + vreq->op = TD_OP_READ; + prot = PROT_WRITE; + break; + case BLKIF_OP_WRITE: + tapreq->op = BLKIF_OP_WRITE; + vreq->op = TD_OP_WRITE; + prot = PROT_READ; + break; + default: + /* TODO log error */ + return EOPNOTSUPP; + } + + /* TODO there should be at least one segment, right? */ + if (tapreq->msg.nr_segments < 1 + || tapreq->msg.nr_segments > BLKIF_MAX_SEGMENTS_PER_REQUEST) { + /* TODO log error */ + return EINVAL; + } + + for (i = 0; i < tapreq->msg.nr_segments; i++) { + struct blkif_request_segment *seg = &tapreq->msg.seg[i]; + gref[i] = seg->gref; + + /* + * Note that first and last may be equal, which means only one sector + * must be transferred. + */ + if (seg->last_sect < seg->first_sect) { + /* TODO log error */ + return EINVAL; + } + } + + tapreq->nr_segments = tapreq->msg.nr_segments; + + /* + * Map the request''s data. + */ + tapreq->vma = xc_gnttab_map_domain_grant_refs(blkif->ctx->xcg_handle, + tapreq->nr_segments, blkif->domid, gref, prot); + if (!tapreq->vma) { + /* TODO log error */ + return errno; + } + + tapreq->id = tapreq->msg.id; + + /* + * Vectorizes the request: creates the struct iovec (in tapreq->iov) that + * describes each segment to be transferred. Also, merges consecutive + * segments. + * TODO The following piece of code would be much simpler if we didn''t + * merge segments, right? + * + * In each loop, iov points to the previous scatter/gather element in order + * to reuse it if the current and previous segments are consecutive. + */ + iov = tapreq->iov - 1; + last = NULL; + page = tapreq->vma; + + for (i = 0; i < tapreq->nr_segments; i++) { /* for each segment */ + struct blkif_request_segment *seg = &tapreq->msg.seg[i]; + size_t size; + + /* TODO check that first_sect/last_sect are within page */ + + next = page + (seg->first_sect << SECTOR_SHIFT); + size = seg->last_sect - seg->first_sect + 1; + + if (next != last) { + iov++; + iov->base = next; + iov->secs = size; + } else /* The "else" is true if fist_sect is 0. */ + iov->secs += size; + + last = iov->base + (iov->secs << SECTOR_SHIFT); + page += XC_PAGE_SIZE; + } + + vreq->iov = tapreq->iov; + vreq->iovcnt = iov - tapreq->iov + 1; + vreq->sec = tapreq->msg.sector_number; + + /* + * TODO Isn''t this kind of expensive to do for each requests? Why does the + * tapdisk need this in the first place? + */ + snprintf(tapreq->name, sizeof(tapreq->name), "xenvbd-%d-%d.%"SCNx64"", + blkif->domid, blkif->devid, tapreq->msg.id); + + vreq->name = tapreq->name; + vreq->token = blkif; + vreq->cb = __tapdisk_xenblkif_request_cb; + + return 0; +} + +#define msg_to_tapreq(_req) \ + containerof(_req, struct td_xenblkif_req, msg) + +/** + * Queues a ring request, after it prepares it, to the standard taodisk queue + * for processing. + * + * @param blkif the block interface + * @param msg the ring request + * @param tapreq the intermediate request + * + * TODO don''t really need to supply the ring request since it''s either way + * contained in the tapreq + * + * XXX only called by tapdisk_xenblkif_queue_requests + */ +static inline int +tapdisk_xenblkif_queue_request(struct td_xenblkif * const blkif, + blkif_request_t *msg, struct td_xenblkif_req *tapreq) +{ + int err; + + assert(blkif); + assert(msg); + assert(tapreq); + + err = tapdisk_xenblkif_make_vbd_request(blkif, tapreq); + if (err) { + /* TODO log error */ + blkif->stats.errors.map++; + return err; + } + + err = tapdisk_vbd_queue_request(blkif->vbd, &tapreq->vreq); + if (err) { + /* TODO log error */ + blkif->stats.errors.vbd++; + return err; + } + + return 0; +} + +void +tapdisk_xenblkif_queue_requests(struct td_xenblkif * const blkif, + blkif_request_t *reqs[], const int nr_reqs) +{ + int i; + int err; + int nr_errors = 0; + + assert(blkif); + assert(reqs); + assert(nr_reqs >= 0); + + for (i = 0; i < nr_reqs; i++) { /* for each request from the ring... */ + blkif_request_t *msg = reqs[i]; + struct td_xenblkif_req *tapreq; + + assert(msg); + + tapreq = msg_to_tapreq(msg); + + assert(tapreq); + + err = tapdisk_xenblkif_queue_request(blkif, msg, tapreq); + if (err) { + /* TODO log error */ + nr_errors++; + tapdisk_xenblkif_complete_request(blkif, tapreq, err, 1); + } + } + + if (nr_errors) + xenio_blkif_put_response(blkif, NULL, 0, 1); +} + +void +tapdisk_xenblkif_reqs_free(struct td_xenblkif * const blkif) +{ + assert(blkif); + + free(blkif->reqs); + blkif->reqs = NULL; + + free(blkif->reqs_free); + blkif->reqs_free = NULL; +} + +int +tapdisk_xenblkif_reqs_init(struct td_xenblkif *td_blkif) +{ + int i = 0; + int err = 0; + + assert(td_blkif); + + td_blkif->ring_size = td_blkif_ring_size(td_blkif); + assert(td_blkif->ring_size > 0); + + td_blkif->reqs + malloc(td_blkif->ring_size * sizeof(struct td_xenblkif_req)); + if (!td_blkif->reqs) { + /* TODO log error */ + err = -errno; + goto fail; + } + + td_blkif->reqs_free + malloc(td_blkif->ring_size * sizeof(struct xenio_blkif_req *)); + if (!td_blkif->reqs_free) { + /* TODO log error */ + err = -errno; + goto fail; + } + + td_blkif->n_reqs_free = 0; + for (i = 0; i < td_blkif->ring_size; i++) + tapdisk_xenblkif_free_request(td_blkif, &td_blkif->reqs[i]); + + return 0; + +fail: + tapdisk_xenblkif_reqs_free(td_blkif); + return err; +} diff --git a/tools/blktap3/drivers/sring/td-req.h b/tools/blktap3/drivers/sring/td-req.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/sring/td-req.h @@ -0,0 +1,131 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#ifndef __TD_REQ_H__ +#define __TD_REQ_H__ + +#include "tapdisk.h" +#include <sys/types.h> +#include <xen/io/blkif.h> +#include <sys/uio.h> +#include "td-blkif.h" + +/** + * Representation of the intermediate request used to retrieve a request from + * the shared ring and handle it over to the main tapdisk request processing + * routine. We could merge it into td_vbd_request_t or define it inside + * td_vbd_request_t, but keeping it separate simplifies keeping Xen stuff + * outside tapdisk. + * + * TODO rename to something better, e.g. ring_req? + */ +struct td_xenblkif_req { + /** + * A request descriptor in the ring. We need to copy the descriptors + * because the guest may modify it while we''re using it. Note that we only + * copy the descriptor and not the actual data, the guest is free to modify + * the data and corrupt itself if it wants to. + */ + blkif_request_t msg; + + /** + * tapdisk''s representation of the request. + */ + td_vbd_request_t vreq; + + /* + * TODO xenio_blkif_get_request copies the request from the shared ring + * locally, into this.msg, so don''t need to keep a copy of id, op, and + * nr_segments. + */ + + /** + * Request id, must be echoed in response, according to the definition of + * blkif_request. + */ + uint64_t id; + + /** + * operation: read/write + * TODO We maintain this here because we set it in the message when + * pushing the response. The question is whether we really need to set it + * in the first place. + * + * TODO Do we have to keep it here because blkif_request_t may be changed + * by the guest? + */ + uint8_t op; + + /** + * Number of segments. + * + * TODO Do we have to keep it here because blkif_request_t may be changed + * by the guest? + */ + int nr_segments; + + /** + * Pointer to memory-mapped grant refs. We keep this around because we need + * to pass it to xc_gnttab_munmap when the requests is completed. + */ + void *vma; + + /* + * TODO Why 16+1? This member is copied to the corresponding one in + * td_vbd_request_t, so check the limit of that, if there is one. + */ + char name[16 + 1]; + + /** + * The scatter/gather list td_vbd_request_t.iov points to. + */ + struct td_iovec iov[BLKIF_MAX_SEGMENTS_PER_REQUEST]; +}; + +struct td_xenblkif; + +/** + * Queues the requests to the standard tapdisk queue. + * + * @param td_blkif the block interface corresponding to the VBD + * @param reqs array holding the request rescriptors + * @param nr_reqs number of requests in the array + */ +void +tapdisk_xenblkif_queue_requests(struct td_xenblkif * const blkif, + blkif_request_t *reqs[], const int nr_reqs); + +/** + * Initilises the intermediate requests of this block interface. + * + * @params td_blkif the block interface whose requests must be initialised + * @returns 0 on success + */ +int +tapdisk_xenblkif_reqs_init(struct td_xenblkif *td_blkif); + +/** + * Releases all the requests of the block interface. + * + * @param blkif the block interface whose requests should be freed + */ +void +tapdisk_xenblkif_reqs_free(struct td_xenblkif * const blkif); + +#endif /* __TD_REQ_H__ */
Thanos Makatos
2013-Jul-15 11:41 UTC
[PATCH 5 of 6 RESEND v2] blktap3/sring: stats for the shared ring between tapdisk and the front-end
This patch introduces stats for the shared ring between tapdisk and the front-end. I suspect that the stats regarding the ring between tapdisk and blktap don''t make sense any more, so the stats introduced by this patch could be consolidated into the existing ones. Signed-off-by: Thanos Makatos <thanos.makatos@citrix.com> --- Changed since v1: * Simplify sring stats procuring since the VBD now points to the internal sring structure. * Introduce declaration of function tapdisk_xenblkif_stats. diff --git a/tools/blktap3/drivers/sring/td-stats.c b/tools/blktap3/drivers/sring/td-stats.c new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/sring/td-stats.c @@ -0,0 +1,60 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#include "td-stats.h" +#include "td-ctx.h" + +static inline void +__tapdisk_xenblkif_stats(struct td_xenblkif * blkif, td_stats_t * st) +{ + assert(blkif); + assert(st); + assert(blkif->ctx); + + tapdisk_stats_field(st, "pool", blkif->ctx->pool); + tapdisk_stats_field(st, "domid", "d", blkif->domid); + tapdisk_stats_field(st, "devid", "d", blkif->devid); + + tapdisk_stats_field(st, "reqs", "["); + tapdisk_stats_val(st, "llu", blkif->stats.reqs.in); + tapdisk_stats_val(st, "llu", blkif->stats.reqs.out); + tapdisk_stats_leave(st, '']''); + + tapdisk_stats_field(st, "kicks", "["); + tapdisk_stats_val(st, "llu", blkif->stats.kicks.in); + tapdisk_stats_val(st, "llu", blkif->stats.kicks.out); + tapdisk_stats_leave(st, '']''); + + tapdisk_stats_field(st, "errors", "{"); + tapdisk_stats_field(st, "msg", "llu", blkif->stats.errors.msg); + tapdisk_stats_field(st, "map", "llu", blkif->stats.errors.map); + tapdisk_stats_field(st, "vbq", "llu", blkif->stats.errors.vbd); + tapdisk_stats_field(st, "img", "llu", blkif->stats.errors.img); + tapdisk_stats_leave(st, '']''); +} + +void +tapdisk_xenblkif_stats(struct td_xenblkif * blkif, td_stats_t * st) +{ + tapdisk_stats_field(st, "xen-blkifs", "["); + tapdisk_stats_enter(st, ''{''); + __tapdisk_xenblkif_stats(blkif, st); + tapdisk_stats_leave(st, ''}''); + tapdisk_stats_leave(st, '']''); +} diff --git a/tools/blktap3/drivers/sring/td-stats.h b/tools/blktap3/drivers/sring/td-stats.h new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/sring/td-stats.h @@ -0,0 +1,46 @@ +/* + * Copyright (C) 2012 Citrix Ltd. + * + * This program is free software; you can redistribute it and/or + * modify it under the terms of the GNU General Public License + * as published by the Free Software Foundation; either version 2 + * of the License, or (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, + * USA. + */ + +#ifndef __TD_STATS_H__ +#define __TD_STATS_H__ + +struct td_xenblkif_stats { + struct { + unsigned long long in; + unsigned long long out; + } reqs; + struct { + unsigned long long in; + unsigned long long out; + } kicks; + struct { + unsigned long long msg; + unsigned long long map; + unsigned long long vbd; + unsigned long long img; + } errors; +}; + +#include "td-blkif.h" +struct td_xenblkif; + +void +tapdisk_xenblkif_stats(struct td_xenblkif * blkif, td_stats_t * st); + +#endif /* __TD_STATS_H__ */
Thanos Makatos
2013-Jul-15 11:41 UTC
[PATCH 6 of 6 RESEND v2] blktap3/sring: Compile the shared ring into a static library
Signed-off-by: Thanos Makatos <thanos.makatos@citrix.com> --- Changed since v1: * Introduces sring stats in compilation. diff --git a/tools/blktap3/drivers/sring/Makefile b/tools/blktap3/drivers/sring/Makefile new file mode 100644 --- /dev/null +++ b/tools/blktap3/drivers/sring/Makefile @@ -0,0 +1,34 @@ +XEN_ROOT=$(CURDIR)/../../../.. +include $(XEN_ROOT)/tools/Rules.mk + +BLKTAP_ROOT = ../.. + +override CFLAGS += \ + -fno-strict-aliasing \ + -I$(BLKTAP_ROOT)/include \ + -I$(BLKTAP_ROOT)/drivers \ + $(CFLAGS_libxenctrl) \ + $(CFLAGS_libxenstore) \ + $(CFLAGS_xeninclude) \ + -Werror \ + -Wall \ + -Wextra + +override LDFLAGS += \ + $(LDLIBS_libxenstore) \ + $(LDFLAGS_libxenctrl) + +SRING-OBJS := td-blkif.o +SRING-OBJS += td-req.o +SRING-OBJS += td-ctx.o +SRING-OBJS += td-stats.o + +libsring.a: $(SRING-OBJS) + ar rcs libsring.a $(SRING-OBJS) + +all: libsring.a + +clean: + rm -rf $(SRING-OBJS) libsring.a + +.PHONY: all clean
Reasonably Related Threads
- [PATCH 00 of 21 RESEND] blktap3/drivers: Introduce tapdisk server.
- [PATCH 00 of 18] [v2] tools: fix bugs and build errors triggered by -O2 -Wall -Werror
- [PATCH 0 of 7 v5] Introduce the tapback daemon (most of blkback in user-space)
- Xen blktap driver for Ceph RBD : Anybody wants to test ? :p
- [RFC][PATCH] Use ioemu block drivers through blktap