Justin T. Gibbs
2012-Feb-03 05:24 UTC
[PATCH 0 of 5] blkif.h: Document protocol and existing extensions
This patch series attempts to document the blkif PV interface and the various extensions to it that are out in the wild. I assembled this information by reviewing xend, various versions of the Linux blkif drivers, fixing bugs in the FreeBSD blkfront driver, and writing and testing a blkback driver for FreeBSD. Even with this experience, given this interface was previously only documented in source code, I''m sure there are mistakes or ommissions in what I''ve done here. Corrections welcome. All but the last patch in the series is source compatible with existing drivers. This final patch adds FreeBSD''s extension to allow an I/O to span multiple ring entries. The number of outstanding requests, and the max I/O size and number of segments per request, can all be negotiated. Drivers offering this extension are backwards compatible with existing drivers, but the definition of BLKIF_MAX_SEGMENTS_PER_REQUEST has been changed to reflect the new limit of 255. A global replacement of BLKIF_MAX_SEGMENTS_PER_REQUEST with BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK is all that''s required to adjust drivers without the extension to this header change. -- Justin xen/include/public/io/blkif.h | 9 +- xen/include/public/io/blkif.h | 305 ++++++++++++++++++++++++++++++++++------- xen/include/public/io/blkif.h | 22 +++ xen/include/public/io/blkif.h | 101 +++++++++++- xen/include/public/io/blkif.h | 106 ++++++++++++++- 5 files changed, 466 insertions(+), 77 deletions(-)
xen/include/public/io/blkif.h | 9 ++++----- 1 files changed, 4 insertions(+), 5 deletions(-) o Remove trailing whitespace. o Remove a blank line so that a comment block is adjacent to the code it documents. No functional changes. Signed-off-by: Justin T. Gibbs <justing@spectralogic.com> diff -r e2722b24dc09 -r a56382286b67 xen/include/public/io/blkif.h --- a/xen/include/public/io/blkif.h Thu Jan 26 17:43:31 2012 +0000 +++ b/xen/include/public/io/blkif.h Thu Feb 02 16:19:57 2012 -0700 @@ -1,8 +1,8 @@ /****************************************************************************** * blkif.h - * + * * Unified block-device I/O interface for Xen guest OSes. - * + * * Permission is hereby granted, free of charge, to any person obtaining a copy * of this software and associated documentation files (the "Software"), to * deal in the Software without restriction, including without limitation the @@ -35,7 +35,7 @@ * notification can be made conditional on req_event (i.e., the generic * hold-off mechanism provided by the ring macros). Backends must set * req_event appropriately (e.g., using RING_FINAL_CHECK_FOR_REQUESTS()). - * + * * Back->front notifications: When enqueuing a new response, sending a * notification can be made conditional on rsp_event (i.e., the generic * hold-off mechanism provided by the ring macros). Frontends must set @@ -133,7 +133,7 @@ */ #define BLKIF_MAX_SEGMENTS_PER_REQUEST 11 -/* +/* * NB. first_sect and last_sect in blkif_request_segment, as well as * sector_number in blkif_request, are always expressed in 512-byte units. * However they must be properly aligned to the real sector size of the @@ -192,7 +192,6 @@ typedef struct blkif_response blkif_resp /* * Generate blkif ring structures and types. */ - DEFINE_RING_TYPES(blkif, struct blkif_request, struct blkif_response); #define VDISK_CDROM 0x1
Justin T. Gibbs
2012-Feb-03 05:24 UTC
[PATCH 2 of 5] blkif.h: Provide more complete documentation of the blkif interface
xen/include/public/io/blkif.h | 305 ++++++++++++++++++++++++++++++++++------- 1 files changed, 250 insertions(+), 55 deletions(-) o Document the XenBus nodes used in this protocol o Add a state diagram illustrating the roles and responsibilities of both the front and backend during startup. o Correct missed BLKIF_OP_TRIM => BLKIF_OP_DISCARD conversion in a comment. No functional changes. Signed-off-by: Justin T. Gibbs <justing@spectralogic.com> diff -r a56382286b67 -r a1b6e68ff241 xen/include/public/io/blkif.h --- a/xen/include/public/io/blkif.h Thu Feb 02 16:19:57 2012 -0700 +++ b/xen/include/public/io/blkif.h Thu Feb 02 16:19:57 2012 -0700 @@ -22,6 +22,7 @@ * DEALINGS IN THE SOFTWARE. * * Copyright (c) 2003-2004, Keir Fraser + * Copyright (c) 2012, Spectra Logic Corporation */ #ifndef __XEN_PUBLIC_IO_BLKIF_H__ @@ -48,32 +49,246 @@ #define blkif_sector_t uint64_t /* + * Feature and Parameter Negotiation + * ================================+ * The two halves of a Xen block driver utilize nodes within the XenStore to + * communicate capabilities and to negotiate operating parameters. This + * section enumerates these nodes which reside in the respective front and + * backend portions of the XenStore, following the XenBus convention. + * + * All data in the XenStore is stored as strings. Nodes specifying numeric + * values are encoded in decimal. Integer value ranges listed below are + * expressed as fixed sized integer types capable of storing the conversion + * of a properly formated node string, without loss of information. + * + * Any specified default value is in effect if the corresponding XenBus node + * is not present in the XenStore. + * + * See the XenBus state transition diagram below for details on when XenBus + * nodes must be published and when they can be queried. + * + ***************************************************************************** + * Backend XenBus Nodes + ***************************************************************************** + * + *--------------------------------- Features --------------------------------- + * + * feature-barrier + * Values: 0/1 (boolean) + * Default Value: 0 + * + * A value of "1" indicates that the backend can process requests + * containing the BLKIF_OP_WRITE_BARRIER request opcode. Requests + * of this type may still be returned at any time with the + * BLKIF_RSP_EOPNOTSUPP result code. + * + * feature-flush-cache + * Values: 0/1 (boolean) + * Default Value: 0 + * + * A value of "1" indicates that the backend can process requests + * containing the BLKIF_OP_FLUSH_DISKCACHE request opcode. Requests + * of this type may still be returned at any time with the + * BLKIF_RSP_EOPNOTSUPP result code. + * + * feature-discard + * Values: 0/1 (boolean) + * Default Value: 0 + * + * A value of "1" indicates that the backend can process requests + * containing the BLKIF_OP_DISCARD request opcode. Requests + * of this type may still be returned at any time with the + * BLKIF_RSP_EOPNOTSUPP result code. + * + *----------------------- Backend Device Identification ----------------------- + * mode + * Values: "r" (read only), "w" (writable) + * + * The read or write access permissions to the backing store to be + * granted to the frontend. + * + * params + * Values: string + * + * A free formatted string providing sufficient information for the + * backend driver to open the backing device. (e.g. the path to the + * file or block device representing the backing store.) + * + * type + * Values: "file", "phy", "tap" + * + * The type of the backing device/object. + * + *------------------------- Backend Device Properties ------------------------- + * + * discard-aligment + * Values: <uint32_t> + * Default Value: 0 + * Notes: 1, 2 + * + * The offset, in bytes from the beginning of the virtual block device, + * to the first, addressable, discard extent on the underlying device. + * + * discard-granularity + * Values: <uint32_t> + * Default Value: <"sector-size"> + * Notes: 1 + * + * The size, in bytes, of the individually addressable discard extents + * of the underlying device. + * + * discard-secure + * Values: 0/1 (boolean) + * Default Value: 0 + * + * A value of "1" indicates that the backend can process BLKIF_OP_DISCARD + * requests with the BLKIF_DISCARD_SECURE flag set. + * + * info + * Values: <uint32_t> (bitmap) + * + * A collection of bit flags describing attributes of the backing + * device. The VDISK_* macros define the meaning of each bit + * location. + * + * sector-size + * Values: <uint32_t> + * + * The native sector size, in bytes, of the backend device. + * + * sectors + * Values: <uint64_t> + * + * The size of the backend device, expressed in units of its native + * sector size ("sector-size"). + * + ***************************************************************************** + * Frontend XenBus Nodes + ***************************************************************************** + * + *----------------------- Request Transport Parameters ----------------------- + * + * event-channel + * Values: <uint32_t> + * + * The identifier of the Xen event channel used to signal activity + * in the ring buffer. + * + * ring-ref + * Values: <uint32_t> + * + * The Xen grant reference granting permission for the backend to map + * the sole page in a single page sized ring buffer. + * + * protocol + * Values: string (XEN_IO_PROTO_ABI_*) + * Default Value: XEN_IO_PROTO_ABI_NATIVE + * + * The machine ABI rules governing the format of all ring request and + * response structures. + * + *------------------------- Virtual Device Properties ------------------------- + * + * device-type + * Values: "disk", "cdrom", "floppy", etc. + * + * virtual-device + * Values: <uint16_t> (XEN_*_MAJOR << 8 | Minor) + * + * A value indicating the physical device to virtualize within the + * frontend''s domain. (e.g. "The first ATA disk", "The third SCSI + * disk", etc.) + * + * Notes + * ----- + * (1) Devices that support discard functionality may internally allocate + * space (discardable extents) in units that are larger than the + * exported logical block size. + * (2) The discard-alignment parameter allows a physical device to be + * partitioned into virtual devices that do not necessarily begin or + * end on a discardable extent boundary. + */ + +/* + * STATE DIAGRAMS + * + ***************************************************************************** + * Startup * + ***************************************************************************** + * + * Tool stack creates front and back nodes with state XenbusStateInitialising. + * + * Front Back + * ================================= ====================================+ * XenbusStateInitialising XenbusStateInitialising + * o Query virtual device o Query backend device identification + * properties. data. + * o Setup OS device instance. o Open and validate backend device. + * o Publish backend features. + * | + * | + * V + * XenbusStateInitWait + * + * o Query backend features. + * o Allocate and initialize the + * request ring. + * | + * | + * V + * XenbusStateInitialised + * + * o Connect to the request ring and + * event channel. + * o Publish backend device properties. + * | + * | + * V + * XenbusStateConnected + * + * o Query backend device properties. + * o Finalize OS virtual device + * instance. + * | + * | + * V + * XenbusStateConnected + * + * Note: Drivers that do not support any optional features can skip certain + * states in the state machine: + * + * o A frontend may transition to XenbusStateInitialised without + * waiting for the backend to enter XenbusStateInitWait. + * + * o A backend may transition to XenbusStateInitialised, bypassing + * XenbusStateInitWait, without waiting for the frontend to first + * enter the XenbusStateInitialised state. + * + * Drivers that support optional features must tolerate these additional + * state transition paths. In general this means performing the work of + * any skipped state transition, if it has not already been performed, + * in addition to the work associated with entry into the current state. + */ + +/* * REQUEST CODES. */ #define BLKIF_OP_READ 0 #define BLKIF_OP_WRITE 1 /* - * Recognised only if "feature-barrier" is present in backend xenbus info. - * The "feature-barrier" node contains a boolean indicating whether barrier - * requests are likely to succeed or fail. Either way, a barrier request - * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by - * the underlying block-device hardware. The boolean simply indicates whether - * or not it is worthwhile for the frontend to attempt barrier requests. - * If a backend does not recognise BLKIF_OP_WRITE_BARRIER, it should *not* - * create the "feature-barrier" node! + * All writes issued prior to a request with the BLKIF_OP_WRITE_BARRIER + * operation code ("barrier request") must be completed prior to the + * execution of the barrier request. All writes issued after the barrier + * request must not execute until after the completion of the barrier request. + * + * Optional. See "feature-barrier" XenBus node documentation above. */ #define BLKIF_OP_WRITE_BARRIER 2 /* - * Recognised if "feature-flush-cache" is present in backend xenbus - * info. A flush will ask the underlying storage hardware to flush its - * non-volatile caches as appropriate. The "feature-flush-cache" node - * contains a boolean indicating whether flush requests are likely to - * succeed or fail. Either way, a flush request may fail at any time - * with BLKIF_RSP_EOPNOTSUPP if it is unsupported by the underlying - * block-device hardware. The boolean simply indicates whether or not it - * is worthwhile for the frontend to attempt flushes. If a backend does - * not recognise BLKIF_OP_WRITE_FLUSH_CACHE, it should *not* create the - * "feature-flush-cache" node! + * Commit any uncommitted contents of the backing device''s volatile cache + * to stable storage. + * + * Optional. See "feature-flush-cache" XenBus node documentation above. */ #define BLKIF_OP_FLUSH_DISKCACHE 3 /* @@ -82,47 +297,24 @@ */ #define BLKIF_OP_RESERVED_1 4 /* - * Recognised only if "feature-discard" is present in backend xenbus info. - * The "feature-discard" node contains a boolean indicating whether trim - * (ATA) or unmap (SCSI) - conviently called discard requests are likely - * to succeed or fail. Either way, a discard request - * may fail at any time with BLKIF_RSP_EOPNOTSUPP if it is unsupported by - * the underlying block-device hardware. The boolean simply indicates whether - * or not it is worthwhile for the frontend to attempt discard requests. - * If a backend does not recognise BLKIF_OP_DISCARD, it should *not* - * create the "feature-discard" node! + * Indicate to the backend device that a region of storage is no longer in + * use, and may be discarded at any time without impact to the client. If + * the BLKIF_DISCARD_SECURE flag is set on the request, all copies of the + * discarded region on the device must be rendered unrecoverable before the + * command returns. * - * Discard operation is a request for the underlying block device to mark - * extents to be erased. However, discard does not guarantee that the blocks - * will be erased from the device - it is just a hint to the device - * controller that these blocks are no longer in use. What the device - * controller does with that information is left to the controller. - * Discard operations are passed with sector_number as the - * sector index to begin discard operations at and nr_sectors as the number of - * sectors to be discarded. The specified sectors should be discarded if the - * underlying block device supports trim (ATA) or unmap (SCSI) operations, - * or a BLKIF_RSP_EOPNOTSUPP should be returned. - * More information about trim/unmap operations at: + * This operation is analogous to performing a trim (ATA) or unamp (SCSI), + * command on a native device. + * + * More information about trim/unmap operations can be found at: * http://t13.org/Documents/UploadedDocuments/docs2008/ * e07154r6-Data_Set_Management_Proposal_for_ATA-ACS2.doc * http://www.seagate.com/staticfiles/support/disc/manuals/ * Interface%20manuals/100293068c.pdf - * The backend can optionally provide these extra XenBus attributes to - * further optimize the discard functionality: - * ''discard-aligment'' - Devices that support discard functionality may - * internally allocate space in units that are bigger than the exported - * logical block size. The discard-alignment parameter indicates how many bytes - * the beginning of the partition is offset from the internal allocation unit''s - * natural alignment. Do not confuse this with natural disk alignment offset. - * ''discard-granularity'' - Devices that support discard functionality may - * internally allocate space using units that are bigger than the logical block - * size. The discard-granularity parameter indicates the size of the internal - * allocation unit in bytes if reported by the device. Otherwise the - * discard-granularity will be set to match the device''s physical block size. - * It is the minimum size you can discard. - * ''discard-secure'' - All copies of the discarded sectors (potentially created - * by garbage collection) must also be erased. To use this feature, the flag - * BLKIF_DISCARD_SECURE must be set in the blkif_request_discard. + * + * Optional. See "feature-discard", "discard-alignment", + * "discard-granularity", and "discard-secure" in the XenBus node + * documentation above. */ #define BLKIF_OP_DISCARD 5 @@ -147,6 +339,9 @@ struct blkif_request_segment { uint8_t first_sect, last_sect; }; +/* + * Starting ring element for any I/O request. + */ struct blkif_request { uint8_t operation; /* BLKIF_OP_??? */ uint8_t nr_segments; /* number of segments */ @@ -158,7 +353,7 @@ struct blkif_request { typedef struct blkif_request blkif_request_t; /* - * Cast to this structure when blkif_request.operation == BLKIF_OP_TRIM + * Cast to this structure when blkif_request.operation == BLKIF_OP_DISCARD * sizeof(struct blkif_request_discard) <= sizeof(struct blkif_request) */ struct blkif_request_discard {
Justin T. Gibbs
2012-Feb-03 05:24 UTC
[PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers
xen/include/public/io/blkif.h | 22 ++++++++++++++++++++++ 1 files changed, 22 insertions(+), 0 deletions(-) Signed-off-by: Justin T. Gibbs <justing@spectralogic.com> diff -r a1b6e68ff241 -r 65403351f4b4 xen/include/public/io/blkif.h --- a/xen/include/public/io/blkif.h Thu Feb 02 16:19:57 2012 -0700 +++ b/xen/include/public/io/blkif.h Thu Feb 02 16:19:57 2012 -0700 @@ -393,6 +393,28 @@ DEFINE_RING_TYPES(blkif, struct blkif_re #define VDISK_REMOVABLE 0x2 #define VDISK_READONLY 0x4 +/* + * Xen-defined major numbers for virtual block devices. + */ +#define XEN_IDE0_MAJOR 3 +#define XEN_IDE1_MAJOR 22 +#define XEN_SCSI_DISK0_MAJOR 8 +#define XEN_SCSI_DISK1_MAJOR 65 +#define XEN_SCSI_DISK2_MAJOR 66 +#define XEN_SCSI_DISK3_MAJOR 67 +#define XEN_SCSI_DISK4_MAJOR 68 +#define XEN_SCSI_DISK5_MAJOR 69 +#define XEN_SCSI_DISK6_MAJOR 70 +#define XEN_SCSI_DISK7_MAJOR 71 +#define XEN_SCSI_DISK8_MAJOR 128 +#define XEN_SCSI_DISK9_MAJOR 129 +#define XEN_SCSI_DISK10_MAJOR 130 +#define XEN_SCSI_DISK11_MAJOR 131 +#define XEN_SCSI_DISK12_MAJOR 132 +#define XEN_SCSI_DISK13_MAJOR 133 +#define XEN_SCSI_DISK14_MAJOR 134 +#define XEN_SCSI_DISK15_MAJOR 135 + #endif /* __XEN_PUBLIC_IO_BLKIF_H__ */ /*
Justin T. Gibbs
2012-Feb-03 05:24 UTC
[PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
xen/include/public/io/blkif.h | 101 ++++++++++++++++++++++++++++++++++++----- 1 files changed, 87 insertions(+), 14 deletions(-) No functional changes. Signed-off-by: Justin T. Gibbs <justing@spectralogic.com> diff -r 65403351f4b4 -r c3609ad53946 xen/include/public/io/blkif.h --- a/xen/include/public/io/blkif.h Thu Feb 02 16:19:57 2012 -0700 +++ b/xen/include/public/io/blkif.h Thu Feb 02 16:19:57 2012 -0700 @@ -100,6 +100,25 @@ * of this type may still be returned at any time with the * BLKIF_RSP_EOPNOTSUPP result code. * + *----------------------- Request Transport Parameters ------------------------ + * + * max-ring-page-order + * Values: <uint32_t> + * Default Value: 0 + * Notes: 1, 3 + * + * The maximum supported size of the request ring buffer in units of + * lb(machine pages). (e.g. 0 == 1 page, 1 = 2 pages, 2 == 4 pages, + * etc.). + * + * max-ring-pages + * Values: <uint32_t> + * Default Value: 1 + * Notes: 2, 3 + * + * The maximum supported size of the request ring buffer in units of + * machine pages. The value must be a power of 2. + * *----------------------- Backend Device Identification ----------------------- * mode * Values: "r" (read only), "w" (writable) @@ -124,7 +143,7 @@ * discard-aligment * Values: <uint32_t> * Default Value: 0 - * Notes: 1, 2 + * Notes: 4, 5 * * The offset, in bytes from the beginning of the virtual block device, * to the first, addressable, discard extent on the underlying device. @@ -132,7 +151,7 @@ * discard-granularity * Values: <uint32_t> * Default Value: <"sector-size"> - * Notes: 1 + * Notes: 4 * * The size, in bytes, of the individually addressable discard extents * of the underlying device. @@ -176,10 +195,20 @@ * * ring-ref * Values: <uint32_t> + * Notes: 6 * * The Xen grant reference granting permission for the backend to map * the sole page in a single page sized ring buffer. * + * ring-ref%u + * Values: <uint32_t> + * Notes: 6 + * + * For a frontend providing a multi-page ring, a "ring-pages" sized + * list of nodes, each containing a Xen grant reference granting + * permission for the backend to map the page of the ring located + * at page index "%u". Page indexes are zero based. + * * protocol * Values: string (XEN_IO_PROTO_ABI_*) * Default Value: XEN_IO_PROTO_ABI_NATIVE @@ -187,6 +216,25 @@ * The machine ABI rules governing the format of all ring request and * response structures. * + * ring-page-order + * Values: <uint32_t> + * Default Value: 0 + * Maximum Value: MAX(ffs(max-ring-pages) - 1, max-ring-page-order) + * Notes: 1, 3 + * + * The size of the frontend allocated request ring buffer in units + * of lb(machine pages). (e.g. 0 == 1 page, 1 = 2 pages, 2 == 4 pages, + * etc.). + * + * ring-pages + * Values: <uint32_t> + * Default Value: 1 + * Maximum Value: MAX(max-ring-pages,(0x1 << max-ring-page-order)) + * Notes: 2, 3 + * + * The size of the frontend allocated request ring buffer in units of + * machine pages. The value must be a power of 2. + * *------------------------- Virtual Device Properties ------------------------- * * device-type @@ -201,12 +249,25 @@ * * Notes * ----- - * (1) Devices that support discard functionality may internally allocate + * (1) Multi-page ring buffer scheme first developed in the Citrix XenServer + * PV drivers. + * (2) Multi-page ring buffer scheme first used in some RedHat distributions + * including a distribution deployed on certain nodes of the Amazon + * EC2 cluster. + * (3) Support for multi-page ring buffers was implemented independently, + * in slightly different forms, by both Citrix and RedHat/Amazon. + * For full interoperability, block front and backends should support + * both methods of negotiating this capability. + * (4) Devices that support discard functionality may internally allocate * space (discardable extents) in units that are larger than the * exported logical block size. - * (2) The discard-alignment parameter allows a physical device to be + * (5) The discard-alignment parameter allows a physical device to be * partitioned into virtual devices that do not necessarily begin or * end on a discardable extent boundary. + * (6) When there is only a single page allocated to the request ring, + * ''ring-ref'' is used to communicate the grant reference for this + * page to the backend. When using a multi-page ring, the ''ring-ref'' + * node is not created. Instead ''ring-ref0'' - ''ring-refN'' are used. */ /* @@ -224,20 +285,26 @@ * o Query virtual device o Query backend device identification * properties. data. * o Setup OS device instance. o Open and validate backend device. - * o Publish backend features. + * o Publish backend features and + * transport parameters. * | * | * V * XenbusStateInitWait * - * o Query backend features. + * o Query backend features and + * transport parameters. * o Allocate and initialize the * request ring. + * o Publish transport parameters + * that will be in effect during + * this connection. * | * | * V * XenbusStateInitialised * + * o Query frontend transport parameters. * o Connect to the request ring and * event channel. * o Publish backend device properties. @@ -254,20 +321,26 @@ * V * XenbusStateConnected * - * Note: Drivers that do not support any optional features can skip certain - * states in the state machine: + * Note: Drivers that do not support any optional features, or the negotiation + * of transport parameters, can skip certain states in the state machine: * * o A frontend may transition to XenbusStateInitialised without - * waiting for the backend to enter XenbusStateInitWait. + * waiting for the backend to enter XenbusStateInitWait. In this + * case, default transport parameters are in effect and any + * transport parameters published by the frontend must contain + * their default values. * * o A backend may transition to XenbusStateInitialised, bypassing * XenbusStateInitWait, without waiting for the frontend to first - * enter the XenbusStateInitialised state. + * enter the XenbusStateInitialised state. In this case, default + * transport parameters are in effect and any transport parameters + * published by the backend must contain their default values. * - * Drivers that support optional features must tolerate these additional - * state transition paths. In general this means performing the work of - * any skipped state transition, if it has not already been performed, - * in addition to the work associated with entry into the current state. + * Drivers that support optional features and/or transport parameter + * negotiation must tolerate these additional state transition paths. + * In general this means performing the work of any skipped state + * transition, if it has not already been performed, in addition to the + * work associated with entry into the current state. */ /*
Justin T. Gibbs
2012-Feb-03 05:24 UTC
[PATCH 5 of 5] blkif.h: Define and document the request number/size/segments extension
xen/include/public/io/blkif.h | 106 ++++++++++++++++++++++++++++++++++++++++- 1 files changed, 103 insertions(+), 3 deletions(-) Note: The definition of BLKIF_MAX_SEGMENTS_PER_REQUEST has changed. Drivers must be updated to, at minimum, use BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK, before being compatible with this header file. This extension first appeared in FreeBSD. Signed-off-by: Justin T. Gibbs <justing@spectralogic.com> diff -r c3609ad53946 -r f5c2e866c661 xen/include/public/io/blkif.h --- a/xen/include/public/io/blkif.h Thu Feb 02 16:19:57 2012 -0700 +++ b/xen/include/public/io/blkif.h Thu Feb 02 16:19:57 2012 -0700 @@ -119,6 +119,29 @@ * The maximum supported size of the request ring buffer in units of * machine pages. The value must be a power of 2. * + * max-requests <uint32_t> + * Default Value: BLKIF_MAX_RING_REQUESTS(PAGE_SIZE) + * Maximum Value: BLKIF_MAX_RING_REQUESTS(PAGE_SIZE * max-ring-pages) + * + * The maximum number of concurrent requests supported by the backend. + * + * max-request-segments + * Values: <uint8_t> + * Default Value: BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK + * Maximum Value: 255 + * + * The maximum value of blkif_request.nr_segments supported by + * the backend. + * + * max-request-size + * Values: <uint32_t> + * Default Value: BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK * PAGE_SIZE + * Maximum Value: 255 * PAGE_SIZE + * + * The maximum amount of data, in bytes, that can be referenced by a + * request type that accesses frontend memory (currently BLKIF_OP_READ, + * BLKIF_OP_WRITE, or BLKIF_OP_WRITE_BARRIER). + * *----------------------- Backend Device Identification ----------------------- * mode * Values: "r" (read only), "w" (writable) @@ -235,6 +258,31 @@ * The size of the frontend allocated request ring buffer in units of * machine pages. The value must be a power of 2. * + * max-requests + * Values: <uint32_t> + * Default Value: BLKIF_MAX_RING_REQUESTS(PAGE_SIZE) + * Maximum Value: BLKIF_MAX_RING_REQUESTS(PAGE_SIZE * max-ring_pages) + * + * The maximum number of concurrent requests that will be issued by + * the frontend. + * + * max-request-segments + * Values: <uint8_t> + * Default Value: BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK + * Maximum Value: MIN(255, backend/max-request-segments) + * + * The maximum value the frontend will set in the + * blkif_request.nr_segments field. + * + * max-request-size + * Values: <uint32_t> + * Default Value: BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK * PAGE_SIZE + * Maximum Value: max-request-segments * PAGE_SIZE + * + * The maximum amount of data, in bytes, that can be referenced by + * a request type that accesses frontend memory (currently BLKIF_OP_READ, + * BLKIF_OP_WRITE, or BLKIF_OP_WRITE_BARRIER). + * *------------------------- Virtual Device Properties ------------------------- * * device-type @@ -392,11 +440,21 @@ #define BLKIF_OP_DISCARD 5 /* - * Maximum scatter/gather segments per request. + * Maximum scatter/gather segments associated with a request header block. * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE. * NB. This could be 12 if the ring indexes weren''t stored in the same page. */ -#define BLKIF_MAX_SEGMENTS_PER_REQUEST 11 +#define BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK 11 + +/* + * Maximum scatter/gather segments associated with a segment block. + */ +#define BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK 14 + +/* + * Maximum scatter/gather segments per request (header + segment blocks). + */ +#define BLKIF_MAX_SEGMENTS_PER_REQUEST 255 /* * NB. first_sect and last_sect in blkif_request_segment, as well as @@ -411,9 +469,25 @@ struct blkif_request_segment { /* @last_sect: last sector in frame to transfer (inclusive). */ uint8_t first_sect, last_sect; }; +typedef struct blkif_request_segment blkif_request_segment_t; /* * Starting ring element for any I/O request. + * + * One or more segment blocks can be inserted into the request ring + * just after a blkif_request_t, allowing requests to operate on + * up to BLKIF_MAX_SEGMENTS_PER_REQUEST. + * + * BLKIF_SEGS_TO_BLOCKS() can be used on blkif_requst.nr_segments + * to determine the number of contiguous ring entries associated + * with this request. + * + * Note: Due to the way Xen request rings operate, the producer and + * consumer indices of the ring must be incremented by the + * BLKIF_SEGS_TO_BLOCKS() value of the associated request. + * (e.g. a response to a 3 ring entry request must also consume + * 3 entries in the ring, even though only the first ring entry + * in the response has any data.) */ struct blkif_request { uint8_t operation; /* BLKIF_OP_??? */ @@ -421,11 +495,22 @@ struct blkif_request { blkif_vdev_t handle; /* only for read/write requests */ uint64_t id; /* private guest value, echoed in resp */ blkif_sector_t sector_number;/* start sector idx on disk (r/w only) */ - struct blkif_request_segment seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; + blkif_request_segment_t seg[BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK]; }; typedef struct blkif_request blkif_request_t; /* + * A segment block is a ring request structure that contains only + * segment data. + * + * sizeof(struct blkif_segment_block) <= sizeof(struct blkif_request) + */ +struct blkif_segment_block { + blkif_request_segment_t seg[BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK]; +}; +typedef struct blkif_segment_block blkif_segment_block_t; + +/* * Cast to this structure when blkif_request.operation == BLKIF_OP_DISCARD * sizeof(struct blkif_request_discard) <= sizeof(struct blkif_request) */ @@ -462,6 +547,21 @@ typedef struct blkif_response blkif_resp */ DEFINE_RING_TYPES(blkif, struct blkif_request, struct blkif_response); +/* + * Index to, and treat as a segment block, an entry in the ring. + */ +#define BLKRING_GET_SEG_BLOCK(_r, _idx) \ + (((blkif_segment_block_t *)RING_GET_REQUEST(_r, _idx))->seg) + +/* + * The number of ring request blocks required to handle an I/O + * request containing _segs segments. + */ +#define BLKIF_SEGS_TO_BLOCKS(_segs) \ + ((((_segs - BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK) \ + + (BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK - 1)) \ + / BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK) + /*header_block*/1) + #define VDISK_CDROM 0x1 #define VDISK_REMOVABLE 0x2 #define VDISK_READONLY 0x4
Jan Beulich
2012-Feb-03 09:52 UTC
Re: [PATCH 0 of 5] blkif.h: Document protocol and existing extensions
>>> On 03.02.12 at 06:24, "Justin T. Gibbs" <justing@spectralogic.com> wrote: > This patch series attempts to document the blkif PV interface and > the various extensions to it that are out in the wild. I assembled > this information by reviewing xend, various versions of the Linux > blkif drivers, fixing bugs in the FreeBSD blkfront driver, and > writing and testing a blkback driver for FreeBSD.Thanks for doing this!> All but the last patch in the series is source compatible with > existing drivers. This final patch adds FreeBSD''s extension to > allow an I/O to span multiple ring entries. The number of outstanding > requests, and the max I/O size and number of segments per request, > can all be negotiated. Drivers offering this extension are backwards > compatible with existing drivers, but the definition of > BLKIF_MAX_SEGMENTS_PER_REQUEST has been changed to reflect the new > limit of 255. A global replacement of BLKIF_MAX_SEGMENTS_PER_REQUEST > with BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK is all that''s required to > adjust drivers without the extension to this header change.No, this sort of change is just not valid to be done. Changes to public headers must always (default to) be backwards compatible. Jan
Jan Beulich
2012-Feb-03 13:31 UTC
Re: [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers
>>> On 03.02.12 at 06:24, "Justin T. Gibbs" <justing@spectralogic.com> wrote:These being mostly meaningless to non-Linux, I don''t think they belong here. Jan> --- a/xen/include/public/io/blkif.h Thu Feb 02 16:19:57 2012 -0700 > +++ b/xen/include/public/io/blkif.h Thu Feb 02 16:19:57 2012 -0700 > @@ -393,6 +393,28 @@ DEFINE_RING_TYPES(blkif, struct blkif_re > #define VDISK_REMOVABLE 0x2 > #define VDISK_READONLY 0x4 > > +/* > + * Xen-defined major numbers for virtual block devices. > + */ > +#define XEN_IDE0_MAJOR 3 > +#define XEN_IDE1_MAJOR 22 > +#define XEN_SCSI_DISK0_MAJOR 8 > +#define XEN_SCSI_DISK1_MAJOR 65 > +#define XEN_SCSI_DISK2_MAJOR 66 > +#define XEN_SCSI_DISK3_MAJOR 67 > +#define XEN_SCSI_DISK4_MAJOR 68 > +#define XEN_SCSI_DISK5_MAJOR 69 > +#define XEN_SCSI_DISK6_MAJOR 70 > +#define XEN_SCSI_DISK7_MAJOR 71 > +#define XEN_SCSI_DISK8_MAJOR 128 > +#define XEN_SCSI_DISK9_MAJOR 129 > +#define XEN_SCSI_DISK10_MAJOR 130 > +#define XEN_SCSI_DISK11_MAJOR 131 > +#define XEN_SCSI_DISK12_MAJOR 132 > +#define XEN_SCSI_DISK13_MAJOR 133 > +#define XEN_SCSI_DISK14_MAJOR 134 > +#define XEN_SCSI_DISK15_MAJOR 135 > + > #endif /* __XEN_PUBLIC_IO_BLKIF_H__ */ > > /* > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
Jan Beulich
2012-Feb-03 13:33 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
>>> On 03.02.12 at 06:24, "Justin T. Gibbs" <justing@spectralogic.com> wrote: > @@ -187,6 +216,25 @@ > * The machine ABI rules governing the format of all ring request and > * response structures. > * > + * ring-page-order > + * Values: <uint32_t> > + * Default Value: 0 > + * Maximum Value: MAX(ffs(max-ring-pages) - 1, max-ring-page-order)Why not just max-ring-page-order. After all, the value is meaningless for a backend that only exposes max-ring-pages.> + * Notes: 1, 3 > + * > + * The size of the frontend allocated request ring buffer in units > + * of lb(machine pages). (e.g. 0 == 1 page, 1 = 2 pages, 2 == 4 pages, > + * etc.). > + * > + * ring-pages > + * Values: <uint32_t> > + * Default Value: 1 > + * Maximum Value: MAX(max-ring-pages,(0x1 << max-ring-page-order))Similarly here - just max-ring-pages should do.> + * Notes: 2, 3 > + * > + * The size of the frontend allocated request ring buffer in units of > + * machine pages. The value must be a power of 2. > + * > *------------------------- Virtual Device Properties ------------------------- > * > * device-typeJan
Jan Beulich
2012-Feb-03 13:34 UTC
Re: [PATCH 5 of 5] blkif.h: Define and document the request number/size/segments extension
>>> On 03.02.12 at 06:24, "Justin T. Gibbs" <justing@spectralogic.com> wrote: > xen/include/public/io/blkif.h | 106 ++++++++++++++++++++++++++++++++++++++++- > 1 files changed, 103 insertions(+), 3 deletions(-) > > > Note: The definition of BLKIF_MAX_SEGMENTS_PER_REQUEST has changed. > Drivers must be updated to, at minimum, use > BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK, before being compatible > with this header file.NAK. No backwards incompatible changes allowed in public headers. Jan> This extension first appeared in FreeBSD. > > Signed-off-by: Justin T. Gibbs <justing@spectralogic.com> > > diff -r c3609ad53946 -r f5c2e866c661 xen/include/public/io/blkif.h > --- a/xen/include/public/io/blkif.h Thu Feb 02 16:19:57 2012 -0700 > +++ b/xen/include/public/io/blkif.h Thu Feb 02 16:19:57 2012 -0700 > @@ -119,6 +119,29 @@ > * The maximum supported size of the request ring buffer in units of > * machine pages. The value must be a power of 2. > * > + * max-requests <uint32_t> > + * Default Value: BLKIF_MAX_RING_REQUESTS(PAGE_SIZE) > + * Maximum Value: BLKIF_MAX_RING_REQUESTS(PAGE_SIZE * max-ring-pages) > + * > + * The maximum number of concurrent requests supported by the backend. > + * > + * max-request-segments > + * Values: <uint8_t> > + * Default Value: BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK > + * Maximum Value: 255 > + * > + * The maximum value of blkif_request.nr_segments supported by > + * the backend. > + * > + * max-request-size > + * Values: <uint32_t> > + * Default Value: BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK * PAGE_SIZE > + * Maximum Value: 255 * PAGE_SIZE > + * > + * The maximum amount of data, in bytes, that can be referenced by a > + * request type that accesses frontend memory (currently > BLKIF_OP_READ, > + * BLKIF_OP_WRITE, or BLKIF_OP_WRITE_BARRIER). > + * > *----------------------- Backend Device Identification ----------------------- > * mode > * Values: "r" (read only), "w" (writable) > @@ -235,6 +258,31 @@ > * The size of the frontend allocated request ring buffer in units of > * machine pages. The value must be a power of 2. > * > + * max-requests > + * Values: <uint32_t> > + * Default Value: BLKIF_MAX_RING_REQUESTS(PAGE_SIZE) > + * Maximum Value: BLKIF_MAX_RING_REQUESTS(PAGE_SIZE * max-ring_pages) > + * > + * The maximum number of concurrent requests that will be issued by > + * the frontend. > + * > + * max-request-segments > + * Values: <uint8_t> > + * Default Value: BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK > + * Maximum Value: MIN(255, backend/max-request-segments) > + * > + * The maximum value the frontend will set in the > + * blkif_request.nr_segments field. > + * > + * max-request-size > + * Values: <uint32_t> > + * Default Value: BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK * PAGE_SIZE > + * Maximum Value: max-request-segments * PAGE_SIZE > + * > + * The maximum amount of data, in bytes, that can be referenced by > + * a request type that accesses frontend memory (currently > BLKIF_OP_READ, > + * BLKIF_OP_WRITE, or BLKIF_OP_WRITE_BARRIER). > + * > *------------------------- Virtual Device Properties ------------------------- > * > * device-type > @@ -392,11 +440,21 @@ > #define BLKIF_OP_DISCARD 5 > > /* > - * Maximum scatter/gather segments per request. > + * Maximum scatter/gather segments associated with a request header block. > * This is carefully chosen so that sizeof(blkif_ring_t) <= PAGE_SIZE. > * NB. This could be 12 if the ring indexes weren''t stored in the same > page. > */ > -#define BLKIF_MAX_SEGMENTS_PER_REQUEST 11 > +#define BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK 11 > + > +/* > + * Maximum scatter/gather segments associated with a segment block. > + */ > +#define BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK 14 > + > +/* > + * Maximum scatter/gather segments per request (header + segment blocks). > + */ > +#define BLKIF_MAX_SEGMENTS_PER_REQUEST 255 > > /* > * NB. first_sect and last_sect in blkif_request_segment, as well as > @@ -411,9 +469,25 @@ struct blkif_request_segment { > /* @last_sect: last sector in frame to transfer (inclusive). */ > uint8_t first_sect, last_sect; > }; > +typedef struct blkif_request_segment blkif_request_segment_t; > > /* > * Starting ring element for any I/O request. > + * > + * One or more segment blocks can be inserted into the request ring > + * just after a blkif_request_t, allowing requests to operate on > + * up to BLKIF_MAX_SEGMENTS_PER_REQUEST. > + * > + * BLKIF_SEGS_TO_BLOCKS() can be used on blkif_requst.nr_segments > + * to determine the number of contiguous ring entries associated > + * with this request. > + * > + * Note: Due to the way Xen request rings operate, the producer and > + * consumer indices of the ring must be incremented by the > + * BLKIF_SEGS_TO_BLOCKS() value of the associated request. > + * (e.g. a response to a 3 ring entry request must also consume > + * 3 entries in the ring, even though only the first ring entry > + * in the response has any data.) > */ > struct blkif_request { > uint8_t operation; /* BLKIF_OP_??? */ > @@ -421,11 +495,22 @@ struct blkif_request { > blkif_vdev_t handle; /* only for read/write requests */ > uint64_t id; /* private guest value, echoed in resp */ > blkif_sector_t sector_number;/* start sector idx on disk (r/w only) */ > - struct blkif_request_segment seg[BLKIF_MAX_SEGMENTS_PER_REQUEST]; > + blkif_request_segment_t seg[BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK]; > }; > typedef struct blkif_request blkif_request_t; > > /* > + * A segment block is a ring request structure that contains only > + * segment data. > + * > + * sizeof(struct blkif_segment_block) <= sizeof(struct blkif_request) > + */ > +struct blkif_segment_block { > + blkif_request_segment_t seg[BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK]; > +}; > +typedef struct blkif_segment_block blkif_segment_block_t; > + > +/* > * Cast to this structure when blkif_request.operation == BLKIF_OP_DISCARD > * sizeof(struct blkif_request_discard) <= sizeof(struct blkif_request) > */ > @@ -462,6 +547,21 @@ typedef struct blkif_response blkif_resp > */ > DEFINE_RING_TYPES(blkif, struct blkif_request, struct blkif_response); > > +/* > + * Index to, and treat as a segment block, an entry in the ring. > + */ > +#define BLKRING_GET_SEG_BLOCK(_r, _idx) \ > + (((blkif_segment_block_t *)RING_GET_REQUEST(_r, _idx))->seg) > + > +/* > + * The number of ring request blocks required to handle an I/O > + * request containing _segs segments. > + */ > +#define BLKIF_SEGS_TO_BLOCKS(_segs) \ > + ((((_segs - BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK) \ > + + (BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK - 1)) \ > + / BLKIF_MAX_SEGMENTS_PER_SEGMENT_BLOCK) + /*header_block*/1) > + > #define VDISK_CDROM 0x1 > #define VDISK_REMOVABLE 0x2 > #define VDISK_READONLY 0x4 > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
Justin Gibbs
2012-Feb-03 14:58 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
On Feb 3, 2012, at 6:33 AM, Jan Beulich wrote:>>>> On 03.02.12 at 06:24, "Justin T. Gibbs" <justing@spectralogic.com> wrote: >> @@ -187,6 +216,25 @@ >> * The machine ABI rules governing the format of all ring request and >> * response structures. >> * >> + * ring-page-order >> + * Values: <uint32_t> >> + * Default Value: 0 >> + * Maximum Value: MAX(ffs(max-ring-pages) - 1, max-ring-page-order) > > Why not just max-ring-page-order. After all, the value is meaningless > for a backend that only exposes max-ring-pages. > >> + * Notes: 1, 3 >> + * >> + * The size of the frontend allocated request ring buffer in units >> + * of lb(machine pages). (e.g. 0 == 1 page, 1 = 2 pages, 2 == 4 pages, >> + * etc.). >> + * >> + * ring-pages >> + * Values: <uint32_t> >> + * Default Value: 1 >> + * Maximum Value: MAX(max-ring-pages,(0x1 << max-ring-page-order)) > > Similarly here - just max-ring-pages should do.This patch documents existing extensions. For a new driver to properly negotiate a multi-page ring with a Dom0 on EC2 (ring-pages) or a Citrix Dom0/guest (ring-page-order), you must use the XenBus node names as I''ve defined them here. -- Justin
Jan Beulich
2012-Feb-03 15:01 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
>>> On 03.02.12 at 15:58, Justin Gibbs <justing@spectralogic.com> wrote: > On Feb 3, 2012, at 6:33 AM, Jan Beulich wrote: > >>>>> On 03.02.12 at 06:24, "Justin T. Gibbs" <justing@spectralogic.com> wrote: >>> @@ -187,6 +216,25 @@ >>> * The machine ABI rules governing the format of all ring request and >>> * response structures. >>> * >>> + * ring-page-order >>> + * Values: <uint32_t> >>> + * Default Value: 0 >>> + * Maximum Value: MAX(ffs(max-ring-pages) - 1, max-ring-page-order) >> >> Why not just max-ring-page-order. After all, the value is meaningless >> for a backend that only exposes max-ring-pages. >> >>> + * Notes: 1, 3 >>> + * >>> + * The size of the frontend allocated request ring buffer in units >>> + * of lb(machine pages). (e.g. 0 == 1 page, 1 = 2 pages, 2 == 4 pages, >>> + * etc.). >>> + * >>> + * ring-pages >>> + * Values: <uint32_t> >>> + * Default Value: 1 >>> + * Maximum Value: MAX(max-ring-pages,(0x1 << max-ring-page-order)) >> >> Similarly here - just max-ring-pages should do. > > This patch documents existing extensions. For a new driver to properly > negotiate a > multi-page ring with a Dom0 on EC2 (ring-pages) or a Citrix Dom0/guest > (ring-page-order), > you must use the XenBus node names as I''ve defined them here.That wasn''t my point. I was exclusively asking the maximum values to get simplified. Jan
Justin Gibbs
2012-Feb-03 15:19 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
On Feb 3, 2012, at 8:01 AM, Jan Beulich wrote:>>>> On 03.02.12 at 15:58, Justin Gibbs <justing@spectralogic.com> wrote: >> On Feb 3, 2012, at 6:33 AM, Jan Beulich wrote: >> >>>>>> On 03.02.12 at 06:24, "Justin T. Gibbs" <justing@spectralogic.com> wrote: >>>> @@ -187,6 +216,25 @@ >>>> * The machine ABI rules governing the format of all ring request and >>>> * response structures. >>>> * >>>> + * ring-page-order >>>> + * Values: <uint32_t> >>>> + * Default Value: 0 >>>> + * Maximum Value: MAX(ffs(max-ring-pages) - 1, max-ring-page-order) >>> >>> Why not just max-ring-page-order. After all, the value is meaningless >>> for a backend that only exposes max-ring-pages. >>> >>>> + * Notes: 1, 3 >>>> + * >>>> + * The size of the frontend allocated request ring buffer in units >>>> + * of lb(machine pages). (e.g. 0 == 1 page, 1 = 2 pages, 2 == 4 pages, >>>> + * etc.). >>>> + * >>>> + * ring-pages >>>> + * Values: <uint32_t> >>>> + * Default Value: 1 >>>> + * Maximum Value: MAX(max-ring-pages,(0x1 << max-ring-page-order)) >>> >>> Similarly here - just max-ring-pages should do. >> >> This patch documents existing extensions. For a new driver to properly >> negotiate a >> multi-page ring with a Dom0 on EC2 (ring-pages) or a Citrix Dom0/guest >> (ring-page-order), >> you must use the XenBus node names as I''ve defined them here. > > That wasn''t my point. I was exclusively asking the maximum values to > get simplified. > > JanI see now. Sorry for misunderstanding. In the introduction section, you''ll see this text: * Any specified default value is in effect if the corresponding XenBus node * is not present in the XenStore. This drastically simplified (shortened) a spec that is already quite long. However, if this is the rule, both types of "max ring size" values are "in effect" even if a back-end does not provide them both. So how do you resolve the conflict? A fully interoperable front should allocate the largest possible ring and advertise that size through both mechanisms in a fully consistent manner. That''s what I was trying to indicate by writing the spec this way. -- Justin
Justin Gibbs
2012-Feb-03 15:49 UTC
Re: [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers
On Feb 3, 2012, at 6:31 AM, Jan Beulich wrote:>>>> On 03.02.12 at 06:24, "Justin T. Gibbs" <justing@spectralogic.com> wrote: > > These being mostly meaningless to non-Linux, I don''t think they > belong here. > > Jan[blkif major number #defines deleted] Front and backends must use some common convention for communicating this virtual device mapping. Since Linux was the first Dom0, all blkif drivers that I''m aware of, regardless of OS platform, reference the Linux values. The mapping really only matters for HVM guests and the hand-off between a QEMU emulated device and the PV driver. To avoid confusion (e.g. name of root mount device, device names in fstab), the PV drivers export alias devices named to match the devices QEMU emulates. The only way to do this successfully is to know the major number scheme used by Dom0/QEMU. -- Justin
Jan Beulich
2012-Feb-03 16:12 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
>>> On 03.02.12 at 16:19, Justin Gibbs <justing@spectralogic.com> wrote: > However, > if this is the rule, both types of "max ring size" values are "in effect" > even if a back-end > does not provide them both. So how do you resolve the conflict? A fully > interoperable > front should allocate the largest possible ring and advertise that size > through both > mechanisms in a fully consistent manner. That''s what I was trying to > indicate by > writing the spec this way.Hmm, I would think a fully compatible frontend should bail (revert to single page ring) on inconsistent max-ring-pages and max-ring-page-order, if both are provided by the backend. The limit for ring-pages should always be max-ring-pages, while the one for ring-page-order should always be max-ring-page-order. Any mixture is an error, unless both values are consistent with one another. Jan
Keir Fraser
2012-Feb-07 13:49 UTC
Re: [PATCH 5 of 5] blkif.h: Define and document the request number/size/segments extension
On 07/02/2012 21:45, "Justin Gibbs" <justing@spectralogic.com> wrote:>> NAK. No backwards incompatible changes allowed in public headers. > > Sorry for the slow reply on this. I''ve been experimenting with ways to keep > legacy > source compatibility. After trying lots of things, testing the impact on an > existing blkfront > and blkback implementation, I think the best two options are: > > 1. Version the header file and require consumers to declare the interface > version > they are using. If the version isn''t declared, the default, legacy, > "version 1.0" will > be in effect. > > Positives: No change in or constant naming conventions. Data > structures and > constants for new features are properly hidden from > legacy implementations. > Negatives: Messy #ifdefsWe already have this. See use of __XEN_INTERFACE_VERSION__/__XEN_LATEST_INTERFACE_VERSION__ in the public headers. -- Keir
Ian Jackson
2012-Feb-07 17:33 UTC
Re: [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers
Justin T. Gibbs writes ("[Xen-devel] [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers"):> +/* > + * Xen-defined major numbers for virtual block devices. > + */Isn''t this the same thing as is in docs/misc/vbd-interface.txt only expressed differently ? Ian.
Ian Jackson
2012-Feb-07 17:51 UTC
Re: [PATCH 2 of 5] blkif.h: Provide more complete documentation of the blkif interface
Justin T. Gibbs writes ("[Xen-devel] [PATCH 2 of 5] blkif.h: Provide more complete documentation of the blkif interface"):> o Document the XenBus nodes used in this protocolAwesome. But I don''t think all of these are really supposed to be part of the guest interface, so you should probably mention which ones are. For example,> + * params > + * Values: stringthis is not intended for use by guests and they should not look at it.> + * virtual-device > + * Values: <uint16_t> (XEN_*_MAJOR << 8 | Minor)This should be a reference to docs/misc/vbd-interface.txt. But isn''t this also encoded in the device path in xenstore ? I''m not qualified to comment on the state machine. Ian.
Justin Gibbs
2012-Feb-07 21:45 UTC
Re: [PATCH 5 of 5] blkif.h: Define and document the request number/size/segments extension
On Feb 3, 2012, at 6:34 AM, Jan Beulich wrote:>>>> On 03.02.12 at 06:24, "Justin T. Gibbs" <justing@spectralogic.com> wrote: >> xen/include/public/io/blkif.h | 106 ++++++++++++++++++++++++++++++++++++++++- >> 1 files changed, 103 insertions(+), 3 deletions(-) >> >> >> Note: The definition of BLKIF_MAX_SEGMENTS_PER_REQUEST has changed. >> Drivers must be updated to, at minimum, use >> BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK, before being compatible >> with this header file. > > NAK. No backwards incompatible changes allowed in public headers.Sorry for the slow reply on this. I''ve been experimenting with ways to keep legacy source compatibility. After trying lots of things, testing the impact on an existing blkfront and blkback implementation, I think the best two options are: 1. Version the header file and require consumers to declare the interface version they are using. If the version isn''t declared, the default, legacy, "version 1.0" will be in effect. Positives: No change in or constant naming conventions. Data structures and constants for new features are properly hidden from legacy implementations. Negatives: Messy #ifdefs 2. Create a new constant BLKIF_MAX_SEGS_PER_REQUEST, have the two other new constants (for header and segment block counts) use this convention, and leave BLKIF_MAX_SEGMENTS_PER_REQUEST alone (but marked DEPRECATED). Positives: No #ifdefs. Shorter names for these constants. Negatives: Changes the existing constant naming convention for updated drivers. Leaves BLKIF_MAX_SEGMENTS_PER_REQUEST, a constant that should no longer be used, visible. Do you have a preference or a suggested alternative? Thanks, Justin
Keir Fraser
2012-Feb-08 06:00 UTC
Re: [PATCH 5 of 5] blkif.h: Define and document the request number/size/segments extension
On 08/02/2012 07:48, "Jan Beulich" <JBeulich@suse.com> wrote:>>>> On 07.02.12 at 14:49, Keir Fraser <keir.xen@gmail.com> wrote: >> On 07/02/2012 21:45, "Justin Gibbs" <justing@spectralogic.com> wrote: >> >>> 1. Version the header file and require consumers to declare the interface >>> version >>> they are using. If the version isn''t declared, the default, legacy, >>> "version 1.0" will >>> be in effect. >>> >>> Positives: No change in or constant naming conventions. Data >>> structures and >>> constants for new features are properly hidden from >>> legacy implementations. >>> Negatives: Messy #ifdefs >> >> We already have this. See use of >> __XEN_INTERFACE_VERSION__/__XEN_LATEST_INTERFACE_VERSION__ in the public >> headers. > > Hmm, I would think these should specifically not be used in the > io/ subtree - those aren''t definitions of the interface to Xen, but > ones shared between the respective backends and frontends. > Each interface is (apart from its relying on the ring definitions) > entirely self contained.I guess I think of the whole header set under xen/include/public as one unit, versioned by __XEN_INTERFACE_VERSION__. And that''s how users are generally syncing with our headers -- copy them in their entirety over the top of the old ones. I''m not over fussed which solution you agree on in this specific case however. -- Keir> Jan >
Jan Beulich
2012-Feb-08 07:48 UTC
Re: [PATCH 5 of 5] blkif.h: Define and document the request number/size/segments extension
>>> On 07.02.12 at 14:49, Keir Fraser <keir.xen@gmail.com> wrote: > On 07/02/2012 21:45, "Justin Gibbs" <justing@spectralogic.com> wrote: > >>> NAK. No backwards incompatible changes allowed in public headers. >> >> Sorry for the slow reply on this. I''ve been experimenting with ways to keep >> legacy >> source compatibility. After trying lots of things, testing the impact on an >> existing blkfront >> and blkback implementation, I think the best two options are: >> >> 1. Version the header file and require consumers to declare the interface >> version >> they are using. If the version isn''t declared, the default, legacy, >> "version 1.0" will >> be in effect. >> >> Positives: No change in or constant naming conventions. Data >> structures and >> constants for new features are properly hidden from >> legacy implementations. >> Negatives: Messy #ifdefs > > We already have this. See use of > __XEN_INTERFACE_VERSION__/__XEN_LATEST_INTERFACE_VERSION__ in the public > headers.Hmm, I would think these should specifically not be used in the io/ subtree - those aren''t definitions of the interface to Xen, but ones shared between the respective backends and frontends. Each interface is (apart from its relying on the ring definitions) entirely self contained. Jan
Jan Beulich
2012-Feb-08 07:49 UTC
Re: [PATCH 5 of 5] blkif.h: Define and document the request number/size/segments extension
>>> On 07.02.12 at 22:45, Justin Gibbs <justing@spectralogic.com> wrote: > On Feb 3, 2012, at 6:34 AM, Jan Beulich wrote: > >>>>> On 03.02.12 at 06:24, "Justin T. Gibbs" <justing@spectralogic.com> wrote: >>> xen/include/public/io/blkif.h | 106 > ++++++++++++++++++++++++++++++++++++++++- >>> 1 files changed, 103 insertions(+), 3 deletions(-) >>> >>> >>> Note: The definition of BLKIF_MAX_SEGMENTS_PER_REQUEST has changed. >>> Drivers must be updated to, at minimum, use >>> BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK, before being compatible >>> with this header file. >> >> NAK. No backwards incompatible changes allowed in public headers. > > Sorry for the slow reply on this. I''ve been experimenting with ways to keep > legacy > source compatibility. After trying lots of things, testing the impact on an > existing blkfront > and blkback implementation, I think the best two options are: > > 1. Version the header file and require consumers to declare the interface > version > they are using. If the version isn''t declared, the default, legacy, > "version 1.0" will > be in effect. > > Positives: No change in or constant naming conventions. Data > structures and > constants for new features are properly hidden from > legacy implementations. > Negatives: Messy #ifdefs > > 2. Create a new constant BLKIF_MAX_SEGS_PER_REQUEST, have the two other > new constants (for header and segment block counts) use this > convention, and > leave BLKIF_MAX_SEGMENTS_PER_REQUEST alone (but marked DEPRECATED). > > Positives: No #ifdefs. Shorter names for these constants. > Negatives: Changes the existing constant naming convention for updated > drivers. Leaves BLKIF_MAX_SEGMENTS_PER_REQUEST, a > constant > that should no longer be used, visible. > > Do you have a preference or a suggested alternative?I would personally prefer 2. Jan
Jan Beulich
2012-Feb-08 16:20 UTC
Re: [PATCH 5 of 5] blkif.h: Define and document the request number/size/segments extension
>>> On 08.02.12 at 07:00, Keir Fraser <keir.xen@gmail.com> wrote: > I guess I think of the whole header set under xen/include/public as one > unit, versioned by __XEN_INTERFACE_VERSION__. And that''s how users are > generally syncing with our headers -- copy them in their entirety over the > top of the old ones.Minus the fact that copying them in their entirety isn''t possible with what is upstream in Linux, as they diverged in more than just white space. Jan
Justin T. Gibbs
2012-Feb-08 22:00 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
On Feb 3, 2012, at 9:12 AM, Jan Beulich wrote:> > On 03.02.12 at 16:19, Justin Gibbs <justing@spectralogic.com> wrote: > > > > However, if this is the rule, both types of "max ring size" values > > are "in effect" even if a back-end does not provide them both. So > > how do you resolve the conflict? A fully interoperable front should > > allocate the largest possible ring and advertise that size through > > both mechanisms in a fully consistent manner. That''s what I was > > trying to indicate by writing the spec this way. > > Hmm, I would think a fully compatible frontend should bail (revert to > single page ring) on inconsistent max-ring-pages and > max-ring-page-order, if both are provided by the backend. The limit > for ring-pages should always be max-ring-pages, while the one for > ring-page-order should always be max-ring-page-order. Any mixture > is an error, unless both values are consistent with one another. > > JanMandating that a front-end publish inconsistent values to the XenStore would be a mistake. Having the front-end only publish the set of nodes that are recognized by the back-end just adds code complexity with no associated increase in functionality. You actually would lose the ability to tell that the driver supports both schemes. To clarify what I mean by that, consider the current state of back-ends in the world. They fall into 1 of 4 camps: 1. No multi-page ring support 2. "Citrix style" multi-page ring support. 3. "Red Hat style" multi-page ring support. 4. "Both styles" multi-page ring support. In cases 1-3, one or both of the max-ring-page* nodes is not present. Per the currently proposed spec, this means that these nodes have their default values. You seem to be advocating that the front-end write the default values into its corresponding node. For cases 2 and 3, this would lead to contradictory values in the XenStore (e.g. ring-page-order=0 and ring-pages=4). This will not confuse the back-end, but could confuse a human. For case 4, the back-end must publish both node types even though the front-end may not look at both. Since we can''t avoid having both nodes, it seems reasonable to allow a front-end to publish them both too. It also avoids the additional logic to test against back-end node presence. With all that said, I believe the spec should indicate that a fully-interoperable implementation should publish both nodes with values that match the allocated ring size. The tolerance level for a back-end publishing inconsistent nodes should be left to the implementation. -- Justin
Justin T. Gibbs
2012-Feb-08 23:12 UTC
Re: [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers
On Feb 7, 2012, at 10:33 AM, Ian Jackson wrote:> Justin T. Gibbs writes ("[Xen-devel] [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers"): > > +/* > > + * Xen-defined major numbers for virtual block devices. > > + */ > > Isn''t this the same thing as is in > docs/misc/vbd-interface.txt > only expressed differently ?Ian. I wasn''t aware of this file. After a brief look, it seems there is information missing from both vbd-interface.txt and blkif.h. Per vbd-interface.txt, there is also an error in blkif.h. The "virtual-device" node must tolerate 32bit integers. I''ll fix the size specification for the "virtual-device" node, but how should I reconcile blkif.h and vbd-interface.txt. I hate information duplication since one version is invariably stale. -- Justin
Justin T. Gibbs
2012-Feb-08 23:24 UTC
Re: [PATCH 2 of 5] blkif.h: Provide more complete documentation of the blkif interface
On Feb 7, 2012, at 10:51 AM, Ian Jackson wrote:> Justin T. Gibbs writes ("[Xen-devel] [PATCH 2 of 5] blkif.h: Provide more complete documentation of the blkif interface"): >> o Document the XenBus nodes used in this protocol > > Awesome. But I don''t think all of these are really supposed to be > part of the guest interface, so you should probably mention which ones > are. > > For example, > >> + * params >> + * Values: string > > this is not intended for use by guests and they should not look at it.Guests can export back-end devices. Here at Spectra we do this all the time. :-) It would be hard to implement a blkback driver without this information. The comment at the top of your email implies you''re okay with it being in this file, but that I should mark "local-end use only" nodes?> >> + * virtual-device >> + * Values: <uint16_t> (XEN_*_MAJOR << 8 | Minor) > > This should be a reference to docs/misc/vbd-interface.txt. But isn''t > this also encoded in the device path in xenstore ?It would appear so. Should this path naming convention be mandated by this file? Since "virtual-device" exists, it seems that front-ends should read it, not parse the path, to determine this information. -- Justin
Justin T. Gibbs
2012-Feb-09 06:22 UTC
Re: [PATCH 5 of 5] blkif.h: Define and document the request number/size/segments extension
On Feb 8, 2012, at 12:48 AM, Jan Beulich wrote:>>>> On 07.02.12 at 14:49, Keir Fraser <keir.xen@gmail.com> wrote: >> On 07/02/2012 21:45, "Justin Gibbs" <justing@spectralogic.com> wrote: >> >>>> NAK. No backwards incompatible changes allowed in public headers. >>> >>> Sorry for the slow reply on this. I''ve been experimenting with ways to keep >>> legacy >>> source compatibility. After trying lots of things, testing the impact on an >>> existing blkfront >>> and blkback implementation, I think the best two options are: >>> >>> 1. Version the header file and require consumers to declare the interface >>> version >>> they are using. If the version isn''t declared, the default, legacy, >>> "version 1.0" will >>> be in effect. >>> >>> Positives: No change in or constant naming conventions. Data >>> structures and >>> constants for new features are properly hidden from >>> legacy implementations. >>> Negatives: Messy #ifdefs >> >> We already have this. See use of >> __XEN_INTERFACE_VERSION__/__XEN_LATEST_INTERFACE_VERSION__ in the public >> headers. > > Hmm, I would think these should specifically not be used in the > io/ subtree - those aren''t definitions of the interface to Xen, but > ones shared between the respective backends and frontends. > Each interface is (apart from its relying on the ring definitions) > entirely self contained. > > JanThe versioning required allows a driver to declare, "I am compatible with any source compatibility breaking changes up to version X of the header file". Declaring support for the latest version does not require that a driver implement the new extensions. Just one constant needs to be renamed. So I don''t see this as really altering the interface between front and backends (i.e. it is not a "blkif2") If the xen-compat.h behavior is that you can safely import the public headers so long as you do not bump __XEN_INTERFACE_VERSION__ until after you have audited and adjusted for the #ifdef guarded changes in the header files, then that is exactly what is needed in blkif.h. -- Justin
Jan Beulich
2012-Feb-09 09:15 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
>>> On 08.02.12 at 23:00, "Justin T. Gibbs" <justing@spectralogic.com> wrote: > On Feb 3, 2012, at 9:12 AM, Jan Beulich wrote: > >> > On 03.02.12 at 16:19, Justin Gibbs <justing@spectralogic.com> wrote: >> > >> > However, if this is the rule, both types of "max ring size" values >> > are "in effect" even if a back-end does not provide them both. So >> > how do you resolve the conflict? A fully interoperable front should >> > allocate the largest possible ring and advertise that size through >> > both mechanisms in a fully consistent manner. That''s what I was >> > trying to indicate by writing the spec this way. >> >> Hmm, I would think a fully compatible frontend should bail (revert to >> single page ring) on inconsistent max-ring-pages and >> max-ring-page-order, if both are provided by the backend. The limit >> for ring-pages should always be max-ring-pages, while the one for >> ring-page-order should always be max-ring-page-order. Any mixture >> is an error, unless both values are consistent with one another. >> >> Jan > > Mandating that a front-end publish inconsistent values to the > XenStore would be a mistake. Having the front-end only publish the > set of nodes that are recognized by the back-end just adds code > complexity with no associated increase in functionality. You > actually would lose the ability to tell that the driver supports > both schemes. > > To clarify what I mean by that, consider the current state of > back-ends in the world. They fall into 1 of 4 camps: > > 1. No multi-page ring support > 2. "Citrix style" multi-page ring support. > 3. "Red Hat style" multi-page ring support. > 4. "Both styles" multi-page ring support. > > In cases 1-3, one or both of the max-ring-page* nodes is not present. > Per the currently proposed spec, this means that these nodes have > their default values. You seem to be advocating that the front-end > write the default values into its corresponding node. For cases 2 > and 3, this would lead to contradictory values in the XenStore (e.g. > ring-page-order=0 and ring-pages=4). This will not confuse the > back-end, but could confuse a human.No, that''s not what I intended to propose. Instead, I''d see the frontend write consistent values (and best always write both node flavors) if at least one of the protocols is supported by the backend. I''m only suggesting that the frontend should not use either multi- page rings at all if it finds both backend node flavors present yet having inconsistent values (i.e. absence of either [but not both] values does *not* count as inconsistent, and hence a single absent node not implicitly assuming its default value).> For case 4, the back-end must publish both node types even though > the front-end may not look at both. Since we can''t avoid having > both nodes, it seems reasonable to allow a front-end to publish > them both too. It also avoids the additional logic to test against > back-end node presence. > > With all that said, I believe the spec should indicate that a > fully-interoperable implementation should publish both nodes with > values that match the allocated ring size. The tolerance level for > a back-end publishing inconsistent nodes should be left to the > implementation.For the moment I decided to only have the frontend handle both flavors in our implementation. That way, not question about inconsistent values published by the backend can arise. Jan
Jan Beulich
2012-Feb-09 09:25 UTC
Re: [PATCH 5 of 5] blkif.h: Define and document the request number/size/segments extension
>>> On 09.02.12 at 07:22, "Justin T. Gibbs" <justing@spectralogic.com> wrote:> On Feb 8, 2012, at 12:48 AM, Jan Beulich wrote: > >>>>> On 07.02.12 at 14:49, Keir Fraser <keir.xen@gmail.com> wrote: >>> On 07/02/2012 21:45, "Justin Gibbs" <justing@spectralogic.com> wrote: >>> >>>>> NAK. No backwards incompatible changes allowed in public headers. >>>> >>>> Sorry for the slow reply on this. I''ve been experimenting with ways to keep >>>> legacy >>>> source compatibility. After trying lots of things, testing the impact on an >>>> existing blkfront >>>> and blkback implementation, I think the best two options are: >>>> >>>> 1. Version the header file and require consumers to declare the interface >>>> version >>>> they are using. If the version isn''t declared, the default, legacy, >>>> "version 1.0" will >>>> be in effect. >>>> >>>> Positives: No change in or constant naming conventions. Data >>>> structures and >>>> constants for new features are properly hidden from >>>> legacy implementations. >>>> Negatives: Messy #ifdefs >>> >>> We already have this. See use of >>> __XEN_INTERFACE_VERSION__/__XEN_LATEST_INTERFACE_VERSION__ in the public >>> headers. >> >> Hmm, I would think these should specifically not be used in the >> io/ subtree - those aren''t definitions of the interface to Xen, but >> ones shared between the respective backends and frontends. >> Each interface is (apart from its relying on the ring definitions) >> entirely self contained. >> >> Jan > > > The versioning required allows a driver to declare, "I am compatible > with any source compatibility breaking changes up to version X of > the header file". Declaring support for the latest version does > not require that a driver implement the new extensions. Just one > constant needs to be renamed. So I don''t see this as really altering > the interface between front and backends (i.e. it is not a "blkif2")Sure. But pulling in header updates should not require *any* other source changes, so long as __XEN_INTERFACE_VERSION__ isn''t bumped. My point about not using __XEN_INTERFACE_VERSION__ under io/ is that these declare protocols that the hypervisor is completely unaware of, whereas the constant really is meant to control compatibility with hypervisor changes. In particular is this or any other change to the protocols under io/ entirely unrelated to the particular hypervisor version (and hence its interface revision). It''s also unclear to me why simply giving new constants new names (instead of changing the meaning of existing ones) is such a big deal - the most strait forward solution doubtlessly is not having any conditionals in that header, and simply add new things with new, unambiguous names. Jan> If the xen-compat.h behavior is that you can safely import the > public headers so long as you do not bump __XEN_INTERFACE_VERSION__ > until after you have audited and adjusted for the #ifdef guarded > changes in the header files, then that is exactly what is needed > in blkif.h. > > -- > Justin
Ian Campbell
2012-Feb-09 09:29 UTC
Re: [PATCH 0 of 5] blkif.h: Document protocol and existing extensions
On Fri, 2012-02-03 at 05:24 +0000, Justin T. Gibbs wrote:> This patch series attempts to document the blkif PV interface and > the various extensions to it that are out in the wild. I assembled > this information by reviewing xend, various versions of the Linux > blkif drivers, fixing bugs in the FreeBSD blkfront driver, and > writing and testing a blkback driver for FreeBSD. > > Even with this experience, given this interface was previously only > documented in source code, I''m sure there are mistakes or ommissions > in what I''ve done here. Corrections welcome.Thanks very much for doing this Justin. Ian.> All but the last patch in the series is source compatible with > existing drivers. This final patch adds FreeBSD''s extension to > allow an I/O to span multiple ring entries. The number of outstanding > requests, and the max I/O size and number of segments per request, > can all be negotiated. Drivers offering this extension are backwards > compatible with existing drivers, but the definition of > BLKIF_MAX_SEGMENTS_PER_REQUEST has been changed to reflect the new > limit of 255. A global replacement of BLKIF_MAX_SEGMENTS_PER_REQUEST > with BLKIF_MAX_SEGMENTS_PER_HEADER_BLOCK is all that''s required to > adjust drivers without the extension to this header change. > > -- > Justin > > xen/include/public/io/blkif.h | 9 +- > xen/include/public/io/blkif.h | 305 ++++++++++++++++++++++++++++++++++------- > xen/include/public/io/blkif.h | 22 +++ > xen/include/public/io/blkif.h | 101 +++++++++++- > xen/include/public/io/blkif.h | 106 ++++++++++++++- > 5 files changed, 466 insertions(+), 77 deletions(-) > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
On Fri, 2012-02-03 at 05:24 +0000, Justin T. Gibbs wrote:> xen/include/public/io/blkif.h | 9 ++++----- > 1 files changed, 4 insertions(+), 5 deletions(-) > > > o Remove trailing whitespace. > o Remove a blank line so that a comment block is adjacent to the > code it documents. > > No functional changes. > > Signed-off-by: Justin T. Gibbs <justing@spectralogic.com>Acked-by: Ian Campbell <ian.campbell@citrix.com>
Ian Campbell
2012-Feb-09 09:36 UTC
Re: [PATCH 2 of 5] blkif.h: Provide more complete documentation of the blkif interface
On Wed, 2012-02-08 at 23:24 +0000, Justin T. Gibbs wrote:> On Feb 7, 2012, at 10:51 AM, Ian Jackson wrote: > > > Justin T. Gibbs writes ("[Xen-devel] [PATCH 2 of 5] blkif.h: Provide more complete documentation of the blkif interface"): > >> o Document the XenBus nodes used in this protocol > > > > Awesome. But I don''t think all of these are really supposed to be > > part of the guest interface, so you should probably mention which ones > > are. > > > > For example, > > > >> + * params > >> + * Values: string > > > > this is not intended for use by guests and they should not look at it. > > Guests can export back-end devices. Here at Spectra we do this all the > time. :-)I think Ian meant "guest" in the sense of "frontend", while a backend, even one running in a driver domain, would be considered some sort of service domain rather than a "guest". IOW a "guest" is the purpose for which you are virtualising, while other domains and just part of the platform I admit the terminology is not terribly well defined though.> It would be hard to implement a blkback driver without this information. > The comment at the top of your email implies you''re okay with it being > in this file, but that I should mark "local-end use only" nodes?I think that makes sense, "back-end only" or perhaps some allusion to it being part of the tools->blkback configuration might express it better?> > > >> + * virtual-device > >> + * Values: <uint16_t> (XEN_*_MAJOR << 8 | Minor) > > > > This should be a reference to docs/misc/vbd-interface.txt. But isn''t > > this also encoded in the device path in xenstore ? > > It would appear so. Should this path naming convention be mandated by > this file? Since "virtual-device" exists, it seems that front-ends should > read it, not parse the path, to determine this information.This seems sensible and matches the actual behaviour of, at least, the Linux frontend. Ian.
Ian Campbell
2012-Feb-09 09:46 UTC
Re: [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers
On Fri, 2012-02-03 at 15:49 +0000, Justin Gibbs wrote:> On Feb 3, 2012, at 6:31 AM, Jan Beulich wrote: > > >>>> On 03.02.12 at 06:24, "Justin T. Gibbs" <justing@spectralogic.com> wrote: > > > > These being mostly meaningless to non-Linux, I don''t think they > > belong here. > > > > Jan > > [blkif major number #defines deleted] > > Front and backends must use some common convention for communicating this > virtual device mapping. Since Linux was the first Dom0, all blkif drivers that I''m > aware of, regardless of OS platform, reference the Linux values.My understanding was that for BSD guests folks generally use the ability to use a raw number as the disk spec (as referred to by the last paragraph of "Configuration file syntax" in docs/misc/vbd-interface.txt. However if you tell me that isn''t the case in practice I''d be quite prepared to believe you ;-)> The mapping really only matters for HVM guests and the hand-off between a > QEMU emulated device and the PV driver. To avoid confusion (e.g. name of > root mount device, device names in fstab), the PV drivers export alias devices named > to match the devices QEMU emulates. The only way to do this successfully is to > know the major number scheme used by Dom0/QEMU.QEMU emulates primary and secondary IDE master+slave, it has no concept of a major or minor number. Why does the dom0 numbering scheme matter? (Does this relate to the old, or possibly current, xend misbehaviour of stat()ing the dom0 device while creating a domain?) On Linux we treat d0-d3 (however they are specified) as aliasing those 4 IDE devices (although we don''t actual present alias devices but instead use UUID or LABEL based fstab etc). Whatever non-Linux does for weird compatibility reasons here I think the right place to document it would be in docs/misc/vbd-interface.txt following the precedent of the "Notes on Linux as a guest" in that document. Ian.
Ian Campbell
2012-Feb-09 09:48 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
On Fri, 2012-02-03 at 14:58 +0000, Justin Gibbs wrote:> On Feb 3, 2012, at 6:33 AM, Jan Beulich wrote: > > >>>> On 03.02.12 at 06:24, "Justin T. Gibbs" <justing@spectralogic.com> wrote: > >> @@ -187,6 +216,25 @@ > >> * The machine ABI rules governing the format of all ring request and > >> * response structures. > >> * > >> + * ring-page-order > >> + * Values: <uint32_t> > >> + * Default Value: 0 > >> + * Maximum Value: MAX(ffs(max-ring-pages) - 1, max-ring-page-order) > > > > Why not just max-ring-page-order. After all, the value is meaningless > > for a backend that only exposes max-ring-pages. > > > >> + * Notes: 1, 3 > >> + * > >> + * The size of the frontend allocated request ring buffer in units > >> + * of lb(machine pages). (e.g. 0 == 1 page, 1 = 2 pages, 2 == 4 pages, > >> + * etc.). > >> + * > >> + * ring-pages > >> + * Values: <uint32_t> > >> + * Default Value: 1 > >> + * Maximum Value: MAX(max-ring-pages,(0x1 << max-ring-page-order)) > > > > Similarly here - just max-ring-pages should do. > > This patch documents existing extensions. For a new driver to properly negotiate a > multi-page ring with a Dom0 on EC2 (ring-pages) or a Citrix Dom0/guest (ring-page-order), > you must use the XenBus node names as I''ve defined them here.Can we pick one and mark it as preferred and the other deprecated or similar? Perhaps backends will have to support both for the foreseeable future but we should make it possible for frontends to just pick one if that''s what they want while nudging them in the direction of all picking the same one. I think the decision should made by the current state of mainline Linux & BSD front and backends, i.e. we should prefer what has been upstreamed rather than private forks if possible. Does anyone know what that state is? Ian.
Paul Durrant
2012-Feb-09 10:05 UTC
Re: [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers
> -----Original Message----- > QEMU emulates primary and secondary IDE master+slave, it has no concept > of a major or minor number. Why does the dom0 numbering scheme > matter?For Windows PV drivers I parse the vbd number so I can extract the disk number which is then present as the scsi target id. I''d prefer if vbd-interface.txt remained the canonical place where the numbering scheme is specified though. Putting definitions into blkif.h just weakens that position. Paul> (Does this relate to the old, or possibly current, xend misbehaviour of > stat()ing the dom0 device while creating a domain?) > > On Linux we treat d0-d3 (however they are specified) as aliasing those 4 IDE > devices (although we don''t actual present alias devices but instead use UUID > or LABEL based fstab etc). > > Whatever non-Linux does for weird compatibility reasons here I think the > right place to document it would be in docs/misc/vbd-interface.txt > following the precedent of the "Notes on Linux as a guest" in that > document. > > Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
Jan Beulich
2012-Feb-09 10:36 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
>>> On 09.02.12 at 10:48, Ian Campbell <Ian.Campbell@citrix.com> wrote: > I think the decision should made by the current state of mainline Linux > & BSD front and backends, i.e. we should prefer what has been upstreamed > rather than private forks if possible. Does anyone know what that state > is?Nothing at all is upstream on the Linux side. Jan
Ian Campbell
2012-Feb-09 10:40 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
On Thu, 2012-02-09 at 10:36 +0000, Jan Beulich wrote:> >>> On 09.02.12 at 10:48, Ian Campbell <Ian.Campbell@citrix.com> wrote: > > I think the decision should made by the current state of mainline Linux > > & BSD front and backends, i.e. we should prefer what has been upstreamed > > rather than private forks if possible. Does anyone know what that state > > is? > > Nothing at all is upstream on the Linux side.Then there may be a case for marking _both_ deprecated and undeprecating whichever one gets upstreamed first. Of course if something is already in BSD we should just choose that. Ian.
Keir Fraser
2012-Feb-09 12:32 UTC
Re: [PATCH 5 of 5] blkif.h: Define and document the request number/size/segments extension
On 08/02/2012 22:22, "Justin T. Gibbs" <justing@spectralogic.com> wrote:> The versioning required allows a driver to declare, "I am compatible > with any source compatibility breaking changes up to version X of > the header file". Declaring support for the latest version does > not require that a driver implement the new extensions. Just one > constant needs to be renamed. So I don''t see this as really altering > the interface between front and backends (i.e. it is not a "blkif2") > > If the xen-compat.h behavior is that you can safely import the > public headers so long as you do not bump __XEN_INTERFACE_VERSION__ > until after you have audited and adjusted for the #ifdef guarded > changes in the header files, then that is exactly what is needed > in blkif.h.That''s what XEN_INTERFACE_VERSION is for, yes. -- Keir
Keir Fraser
2012-Feb-09 12:44 UTC
Re: [PATCH 5 of 5] blkif.h: Define and document the request number/size/segments extension
On 09/02/2012 01:25, "Jan Beulich" <JBeulich@suse.com> wrote:>>>> On 09.02.12 at 07:22, "Justin T. Gibbs" <justing@spectralogic.com> wrote: > >> On Feb 8, 2012, at 12:48 AM, Jan Beulich wrote:>>> >>> Hmm, I would think these should specifically not be used in the >>> io/ subtree - those aren''t definitions of the interface to Xen, but >>> ones shared between the respective backends and frontends. >>> Each interface is (apart from its relying on the ring definitions) >>> entirely self contained. >>> >>> Jan >> >> >> The versioning required allows a driver to declare, "I am compatible >> with any source compatibility breaking changes up to version X of >> the header file". Declaring support for the latest version does >> not require that a driver implement the new extensions. Just one >> constant needs to be renamed. So I don''t see this as really altering >> the interface between front and backends (i.e. it is not a "blkif2") > > Sure. But pulling in header updates should not require *any* other > source changes, so long as __XEN_INTERFACE_VERSION__ isn''t > bumped. > > My point about not using __XEN_INTERFACE_VERSION__ under io/ > is that these declare protocols that the hypervisor is completely > unaware of, whereas the constant really is meant to control > compatibility with hypervisor changes. In particular is this or any > other change to the protocols under io/ entirely unrelated to the > particular hypervisor version (and hence its interface revision).Actually __XEN_INTERFACE_VERSION__ is simply versioning the *source-level API* exposed by those headers. It''s not really directly linked to versioning of the hypervisor ABI, after all that has to ensure backward compatibility whatever we do. Perhaps __XEN_INTERFACE_VERSION__ is simply badly named. -- Keir> It''s also unclear to me why simply giving new constants new names > (instead of changing the meaning of existing ones) is such a big > deal - the most strait forward solution doubtlessly is not having > any conditionals in that header, and simply add new things with > new, unambiguous names. > > Jan > >> If the xen-compat.h behavior is that you can safely import the >> public headers so long as you do not bump __XEN_INTERFACE_VERSION__ >> until after you have audited and adjusted for the #ifdef guarded >> changes in the header files, then that is exactly what is needed >> in blkif.h. >> >> -- >> Justin > > >
Justin T. Gibbs
2012-Feb-09 14:45 UTC
Re: [PATCH 5 of 5] blkif.h: Define and document the request number/size/segments extension
On Feb 9, 2012, at 2:25 AM, Jan Beulich wrote:>>>> On 09.02.12 at 07:22, "Justin T. Gibbs" <justing@spectralogic.com> wrote: > >> On Feb 8, 2012, at 12:48 AM, Jan Beulich wrote: >> >>>>>> On 07.02.12 at 14:49, Keir Fraser <keir.xen@gmail.com> wrote: >>>> On 07/02/2012 21:45, "Justin Gibbs" <justing@spectralogic.com> wrote: >>>> >>>>>> NAK. No backwards incompatible changes allowed in public headers. >>>>> >>>>> Sorry for the slow reply on this. I''ve been experimenting with ways to keep >>>>> legacy >>>>> source compatibility. After trying lots of things, testing the impact on an >>>>> existing blkfront >>>>> and blkback implementation, I think the best two options are: >>>>> >>>>> 1. Version the header file and require consumers to declare the interface >>>>> version >>>>> they are using. If the version isn''t declared, the default, legacy, >>>>> "version 1.0" will >>>>> be in effect. >>>>> >>>>> Positives: No change in or constant naming conventions. Data >>>>> structures and >>>>> constants for new features are properly hidden from >>>>> legacy implementations. >>>>> Negatives: Messy #ifdefs >>>> >>>> We already have this. See use of >>>> __XEN_INTERFACE_VERSION__/__XEN_LATEST_INTERFACE_VERSION__ in the public >>>> headers. >>> >>> Hmm, I would think these should specifically not be used in the >>> io/ subtree - those aren''t definitions of the interface to Xen, but >>> ones shared between the respective backends and frontends. >>> Each interface is (apart from its relying on the ring definitions) >>> entirely self contained. >>> >>> Jan >> >> >> The versioning required allows a driver to declare, "I am compatible >> with any source compatibility breaking changes up to version X of >> the header file". Declaring support for the latest version does >> not require that a driver implement the new extensions. Just one >> constant needs to be renamed. So I don''t see this as really altering >> the interface between front and backends (i.e. it is not a "blkif2") > > Sure. But pulling in header updates should not require *any* other > source changes, so long as __XEN_INTERFACE_VERSION__ isn''t > bumped.Perhaps my language wasn''t clear. The change to a blkif driver is would only necessary if __XEN_INTERFACE_VERSION__ is bumped. I was trying to say that when the bump is made, the needed change would be quite small.> It''s also unclear to me why simply giving new constants new names > (instead of changing the meaning of existing ones) is such a big > deal - the most strait forward solution doubtlessly is not having > any conditionals in that header, and simply add new things with > new, unambiguous names.The confusion arrises because we''d need to, in order to keep the terminology consistent, rename existing constants too. More specifically, a "request" in the legacy driver is a "logical device operation". I would like to retain that convention even if a logical request consumes multiple ring entries. However, if BLKIF_MAX_SEGMENTS_PER_REQUEST comes to mean the number of segments in the first ring entry of a request, then we need to come up with another name for the maximum number of segments in a "logical request". Changing REQUEST to something else (e.g. "OP") means even more surgery to existing drivers so that they consistently use the new terminology. So, in my alternative proposal, I chose to shorten SEGMENTS to SEGS instead. However, this still leaves BLKIF_MAX_SEGMENTS_PER_REQUEST around for folks to use incorrectly. I''d rather just version BLKIF_MAX_SEGMENTS_PER_REQUEST. I''ll post a trial patch with this change shortly. -- Justin
Justin T. Gibbs
2012-Feb-09 14:56 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
On Feb 9, 2012, at 2:15 AM, Jan Beulich wrote:>>>> On 08.02.12 at 23:00, "Justin T. Gibbs" <justing@spectralogic.com> wrote: >> On Feb 3, 2012, at 9:12 AM, Jan Beulich wrote: >> >>>> On 03.02.12 at 16:19, Justin Gibbs <justing@spectralogic.com> wrote: >>>> >>>> However, if this is the rule, both types of "max ring size" values >>>> are "in effect" even if a back-end does not provide them both. So >>>> how do you resolve the conflict? A fully interoperable front should >>>> allocate the largest possible ring and advertise that size through >>>> both mechanisms in a fully consistent manner. That''s what I was >>>> trying to indicate by writing the spec this way. >>> >>> Hmm, I would think a fully compatible frontend should bail (revert to >>> single page ring) on inconsistent max-ring-pages and >>> max-ring-page-order, if both are provided by the backend. The limit >>> for ring-pages should always be max-ring-pages, while the one for >>> ring-page-order should always be max-ring-page-order. Any mixture >>> is an error, unless both values are consistent with one another. >>> >>> Jan >> >> Mandating that a front-end publish inconsistent values to the >> XenStore would be a mistake. Having the front-end only publish the >> set of nodes that are recognized by the back-end just adds code >> complexity with no associated increase in functionality. You >> actually would lose the ability to tell that the driver supports >> both schemes. >> >> To clarify what I mean by that, consider the current state of >> back-ends in the world. They fall into 1 of 4 camps: >> >> 1. No multi-page ring support >> 2. "Citrix style" multi-page ring support. >> 3. "Red Hat style" multi-page ring support. >> 4. "Both styles" multi-page ring support. >> >> In cases 1-3, one or both of the max-ring-page* nodes is not present. >> Per the currently proposed spec, this means that these nodes have >> their default values. You seem to be advocating that the front-end >> write the default values into its corresponding node. For cases 2 >> and 3, this would lead to contradictory values in the XenStore (e.g. >> ring-page-order=0 and ring-pages=4). This will not confuse the >> back-end, but could confuse a human. > > No, that''s not what I intended to propose. Instead, I''d see the > frontend write consistent values (and best always write both node > flavors) if at least one of the protocols is supported by the backend.Then your desire is already allowed by the maximum values for these nodes in the current spec. I will clarify the notes section to say that, if both node types are supported by a driver, both node types should be written with consistent values.> I''m only suggesting that the frontend should not use either multi- > page rings at all if it finds both backend node flavors present yet > having inconsistent values (i.e. absence of either [but not both] > values does *not* count as inconsistent, and hence a single absent > node not implicitly assuming its default value).I think that''s a perfectly reasonable behavior, but not the only reasonable behavior. If I need to write a front or back end driver that is more tolerant in order to have optimum performance with an existing driver that I do not control, the spec should allow me to do this. -- Justin
Justin T. Gibbs
2012-Feb-09 15:12 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
On Feb 9, 2012, at 2:48 AM, Ian Campbell wrote:> On Fri, 2012-02-03 at 14:58 +0000, Justin Gibbs wrote: >> On Feb 3, 2012, at 6:33 AM, Jan Beulich wrote: >> >>>>>> On 03.02.12 at 06:24, "Justin T. Gibbs" <justing@spectralogic.com> wrote: >>>> @@ -187,6 +216,25 @@ >>>> * The machine ABI rules governing the format of all ring request and >>>> * response structures. >>>> * >>>> + * ring-page-order >>>> + * Values: <uint32_t> >>>> + * Default Value: 0 >>>> + * Maximum Value: MAX(ffs(max-ring-pages) - 1, max-ring-page-order) >>> >>> Why not just max-ring-page-order. After all, the value is meaningless >>> for a backend that only exposes max-ring-pages. >>> >>>> + * Notes: 1, 3 >>>> + * >>>> + * The size of the frontend allocated request ring buffer in units >>>> + * of lb(machine pages). (e.g. 0 == 1 page, 1 = 2 pages, 2 == 4 pages, >>>> + * etc.). >>>> + * >>>> + * ring-pages >>>> + * Values: <uint32_t> >>>> + * Default Value: 1 >>>> + * Maximum Value: MAX(max-ring-pages,(0x1 << max-ring-page-order)) >>> >>> Similarly here - just max-ring-pages should do. >> >> This patch documents existing extensions. For a new driver to properly negotiate a >> multi-page ring with a Dom0 on EC2 (ring-pages) or a Citrix Dom0/guest (ring-page-order), >> you must use the XenBus node names as I''ve defined them here. > > Can we pick one and mark it as preferred and the other deprecated or > similar? Perhaps backends will have to support both for the foreseeable > future but we should make it possible for frontends to just pick one if > that''s what they want while nudging them in the direction of all picking > the same one.History says that back-ends are updated more slowly than front-ends. For example, my fixes to QEMU''s serial port emulation that have been in Xen''s mainline for ~2 years are still not widely deployed on EC2. Folks running FreeBSD there are using an ugly hack to FreeBSD''s serial driver so they can get console output. So, if you want your front-ends to have optimal performance regardless of what cloud or appliance they are running on, the two types will have to be supported for some time. Neither is better than the other, just different. The opposite is true for back-ends. If your customer controls the guest environment and chooses not upgrade, you can''t kill support for the variant they use.> I think the decision should made by the current state of mainline Linux > & BSD front and backends, i.e. we should prefer what has been upstreamed > rather than private forks if possible. Does anyone know what that state > is?The state is a bit embarrassing on the BSD side. FreeBSD has had a multi-page ring extension since October of 2010. Unfortunately, I wrote this extension before stumbling upon the Citrix blkif patch or the extension being used on EC2. It is closest to the EC2 implementation, but not quite compatible. The saving grace is that I don''t know of any deployments of this back-end outside of Spectra, and we have not shipped it to customers, so I plan to upstream block front and back drivers that only implement the Citrix and EC2 style extension once blkif.h settles. A preliminary version of these changes went out for review a couple weeks ago. -- Justin
Ian Jackson
2012-Feb-09 15:17 UTC
Re: [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers
Justin T. Gibbs writes ("Re: [Xen-devel] [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers"):> I wasn''t aware of this file. After a brief look, it seems there is > information missing from both vbd-interface.txt and blkif.h. Per > vbd-interface.txt, there is also an error in blkif.h. The "virtual-device" > node must tolerate 32bit integers. > > I''ll fix the size specification for the "virtual-device" node, but how > should I reconcile blkif.h and vbd-interface.txt. I hate information > duplication since one version is invariably stale.I wrote vbd-interface.txt and I think it''s largely accurate. I wasn''t aware of any duplication with blkif.h. If there are differences then we will have to discuss them here I think, or read the code. Ian.
Ian Jackson
2012-Feb-09 15:19 UTC
Re: [PATCH 2 of 5] blkif.h: Provide more complete documentation of the blkif interface
Justin T. Gibbs writes ("Re: [Xen-devel] [PATCH 2 of 5] blkif.h: Provide more complete documentation of the blkif interface"):> On Feb 7, 2012, at 10:51 AM, Ian Jackson wrote: > > Justin T. Gibbs writes ("[Xen-devel] [PATCH 2 of 5] blkif.h: Provide more complete documentation of the blkif interface"): > >> + * params > >> + * Values: string > > > > this is not intended for use by guests and they should not look at it. > > Guests can export back-end devices. Here at Spectra we do this all the > time. :-)Sorry, I meant, not intended for use by frontends. I was using "guest" as a shorthand for that.> It would be hard to implement a blkback driver without this information. > The comment at the top of your email implies you''re okay with it being > in this file, but that I should mark "local-end use only" nodes?Certainly I think it would be good to document this somewhere and here is as good as any.> >> + * virtual-device > >> + * Values: <uint16_t> (XEN_*_MAJOR << 8 | Minor) > > > > This should be a reference to docs/misc/vbd-interface.txt. But isn''t > > this also encoded in the device path in xenstore ? > > It would appear so. Should this path naming convention be mandated by > this file? Since "virtual-device" exists, it seems that front-ends should > read it, not parse the path, to determine this information.Do we know what current frontends actually do ? Ian.
Justin T. Gibbs
2012-Feb-09 16:40 UTC
Re: [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers
On Feb 9, 2012, at 8:17 AM, Ian Jackson wrote:> Justin T. Gibbs writes ("Re: [Xen-devel] [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers"): >> I wasn''t aware of this file. After a brief look, it seems there is >> information missing from both vbd-interface.txt and blkif.h. Per >> vbd-interface.txt, there is also an error in blkif.h. The "virtual-device" >> node must tolerate 32bit integers. >> >> I''ll fix the size specification for the "virtual-device" node, but how >> should I reconcile blkif.h and vbd-interface.txt. I hate information >> duplication since one version is invariably stale. > > I wrote vbd-interface.txt and I think it''s largely accurate. I wasn''t > aware of any duplication with blkif.h. If there are differences then > we will have to discuss them here I think, or read the code. > > Ian.I was thinking that the high numbered SCSI disk major numbers were not covered by vbd-interface.txt. In looking more closely, they seem to be covered in "Notes on Linux as a guest". However, is this text accurate: "Some old configurations may depend on deprecated high-numbered SCSI and IDE disks. This does not work in recent versions of Linux." The upstreamed Linux driver supports 15 SCSI major numbers. Are 13 of these the "deprecated high-numbered" ones, or were there more than 15 at some point? -- Justin
Justin T. Gibbs
2012-Feb-09 16:41 UTC
Re: [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers
On Feb 9, 2012, at 3:05 AM, Paul Durrant wrote:>> -----Original Message----- >> QEMU emulates primary and secondary IDE master+slave, it has no concept >> of a major or minor number. Why does the dom0 numbering scheme >> matter? > > For Windows PV drivers I parse the vbd number so I can extract the disk number which is then present as the scsi target id. I''d prefer if vbd-interface.txt remained the canonical place where the numbering scheme is specified though. Putting definitions into blkif.h just weakens that position.I''ll update the patch to just reference vbd-interface.txt. -- Justin
Ian Jackson
2012-Feb-09 16:52 UTC
Re: [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers
Justin T. Gibbs writes ("Re: [Xen-devel] [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers"):> The upstreamed Linux driver supports 15 SCSI major numbers. > Are 13 of these the "deprecated high-numbered" ones, or were > there more than 15 at some point?My understanding was that current upstream Linux does not ever "steal" scsi major numbers for xen vbds. If this is wrong then so is my document :-). Ian.
Ian Campbell
2012-Feb-09 17:02 UTC
Re: [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers
On Thu, 2012-02-09 at 16:52 +0000, Ian Jackson wrote:> Justin T. Gibbs writes ("Re: [Xen-devel] [PATCH 3 of 5] blkif.h: Add definitions for virtual block device major numbers"): > > The upstreamed Linux driver supports 15 SCSI major numbers. > > Are 13 of these the "deprecated high-numbered" ones, or were > > there more than 15 at some point? > > My understanding was that current upstream Linux does not ever "steal" > scsi major numbers for xen vbds. If this is wrong then so is my > document :-).Looks like it understands the SCSI/IDE major numbers but translates them to the xvd major since: commit c80a420995e721099906607b07c09a24543b31d9 Author: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Date: Thu Dec 2 17:55:00 2010 +0000 xen-blkfront: handle Xen major numbers other than XENVBD This patch makes sure blkfront handles correctly virtual device numbers corresponding to Xen emulated IDE and SCSI disks: in those cases blkfront translates the major number to XENVBD and the minor number to a low xvd minor. Note: this behaviour is different from what old xenlinux PV guests used to do: they used to steal an IDE or SCSI major number and use it instead. Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com> Acked-by: Jeremy Fitzhardinge <jeremy@goop.org> Ian.
Ian Campbell
2012-Feb-13 12:35 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
On Thu, 2012-02-09 at 15:12 +0000, Justin T. Gibbs wrote:> On Feb 9, 2012, at 2:48 AM, Ian Campbell wrote: > > > On Fri, 2012-02-03 at 14:58 +0000, Justin Gibbs wrote: > >> On Feb 3, 2012, at 6:33 AM, Jan Beulich wrote: > >> > >>>>>> On 03.02.12 at 06:24, "Justin T. Gibbs" <justing@spectralogic.com> wrote: > >>>> @@ -187,6 +216,25 @@ > >>>> * The machine ABI rules governing the format of all ring request and > >>>> * response structures. > >>>> * > >>>> + * ring-page-order > >>>> + * Values: <uint32_t> > >>>> + * Default Value: 0 > >>>> + * Maximum Value: MAX(ffs(max-ring-pages) - 1, max-ring-page-order) > >>> > >>> Why not just max-ring-page-order. After all, the value is meaningless > >>> for a backend that only exposes max-ring-pages. > >>> > >>>> + * Notes: 1, 3 > >>>> + * > >>>> + * The size of the frontend allocated request ring buffer in units > >>>> + * of lb(machine pages). (e.g. 0 == 1 page, 1 = 2 pages, 2 == 4 pages, > >>>> + * etc.). > >>>> + * > >>>> + * ring-pages > >>>> + * Values: <uint32_t> > >>>> + * Default Value: 1 > >>>> + * Maximum Value: MAX(max-ring-pages,(0x1 << max-ring-page-order)) > >>> > >>> Similarly here - just max-ring-pages should do. > >> > >> This patch documents existing extensions. For a new driver to properly negotiate a > >> multi-page ring with a Dom0 on EC2 (ring-pages) or a Citrix Dom0/guest (ring-page-order), > >> you must use the XenBus node names as I''ve defined them here. > > > > Can we pick one and mark it as preferred and the other deprecated or > > similar? Perhaps backends will have to support both for the foreseeable > > future but we should make it possible for frontends to just pick one if > > that''s what they want while nudging them in the direction of all picking > > the same one. > > History says that back-ends are updated more slowly than front-ends. For > example, my fixes to QEMU''s serial port emulation that have been in Xen''s > mainline for ~2 years are still not widely deployed on EC2. Folks running > FreeBSD there are using an ugly hack to FreeBSD''s serial driver so they > can get console output. > > So, if you want your front-ends to have optimal performance regardless of > what cloud or appliance they are running on, the two types will have to be > supported for some time. Neither is better than the other, just different. > > The opposite is true for back-ends. If your customer controls the guest > environment and chooses not upgrade, you can''t kill support for the > variant they use.I agree with all you say, the practicalities certainly mean we may well be stuck with both forever. I don''t think there is any harm in documenting one of them as preferred though. If someone has the freedom (or desire) to only implement one of the two in their code base then that provides guidance as to which it should be, at least then we will be encouraging the right trajectory. Also it gives some small amount of ammunition to people who are stuck with a provider (of either front or backend) of the wrong type for their environment and helps break the stalemate as to who should fix their end (however helpful that might be in practice).> > I think the decision should made by the current state of mainline Linux > > & BSD front and backends, i.e. we should prefer what has been upstreamed > > rather than private forks if possible. Does anyone know what that state > > is? > > The state is a bit embarrassing on the BSD side. FreeBSD has had a > multi-page ring extension since October of 2010. Unfortunately, I wrote > this extension before stumbling upon the Citrix blkif patch or the > extension being used on EC2. It is closest to the EC2 implementation, > but not quite compatible. The saving grace is that I don''t know of any > deployments of this back-end outside of Spectra, and we have not shipped > it to customers, so I plan to upstream block front and back drivers that > only implement the Citrix and EC2 style extension once blkif.h settles. A > preliminary version of these changes went out for review a couple weeks > ago.Cool, thanks. Ian.
Konrad Rzeszutek Wilk
2012-Feb-14 13:56 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
> > The state is a bit embarrassing on the BSD side. FreeBSD has had a > > multi-page ring extension since October of 2010. Unfortunately, I wrote > > this extension before stumbling upon the Citrix blkif patch or the > > extension being used on EC2. It is closest to the EC2 implementation, > > but not quite compatible. The saving grace is that I don''t know of any > > deployments of this back-end outside of Spectra, and we have not shipped > > it to customers, so I plan to upstream block front and back drivers thatExcellent <crosses out another TODO on the "get done at some point list">> > only implement the Citrix and EC2 style extension once blkif.h settles. ASo what is the Red Hat version of this extension? Where can I find it? Is it in the 2.6.18 hg tree?> > preliminary version of these changes went out for review a couple weeks > > ago. > > Cool, thanks. > > Ian. > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel
Justin T. Gibbs
2012-Feb-15 06:51 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
On Feb 14, 2012, at 6:56 AM, Konrad Rzeszutek Wilk wrote:> > > The state is a bit embarrassing on the BSD side. FreeBSD has had a > > > multi-page ring extension since October of 2010. Unfortunately, I wrote > > > this extension before stumbling upon the Citrix blkif patch or the > > > extension being used on EC2. It is closest to the EC2 implementation, > > > but not quite compatible. The saving grace is that I don''t know of any > > > deployments of this back-end outside of Spectra, and we have not shipped > > > it to customers, so I plan to upstream block front and back drivers that > > Excellent <crosses out another TODO on the "get done at some point list"> > > > > only implement the Citrix and EC2 style extension once blkif.h settles. A > > So what is the Red Hat version of this extension? Where can I find it? > Is it in the 2.6.18 hg tree?The RedHat distribution on EC2 seems to be running a derivation of the patch described in this thread: http://xen.1045712.n5.nabble.com/PATCH-multi-page-blkfront-blkback-patch-td2527534.html However, the EC2 backend additionally publishes "max-ring-pages" as described in my changes to blkif.h. You can review the FreeBSD implementation of both extensions in the latest versions of blkfront and back here: http://svnweb.freebsd.org/base/head/sys/dev/xen/ -- Justin
Justin T. Gibbs
2012-Feb-15 13:07 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
On Feb 13, 2012, at 5:35 AM, Ian Campbell wrote:> On Thu, 2012-02-09 at 15:12 +0000, Justin T. Gibbs wrote: >> On Feb 9, 2012, at 2:48 AM, Ian Campbell wrote: >> >>> On Fri, 2012-02-03 at 14:58 +0000, Justin Gibbs wrote: >>>> On Feb 3, 2012, at 6:33 AM, Jan Beulich wrote: >>>> >>>>>>>> On 03.02.12 at 06:24, "Justin T. Gibbs" <justing@spectralogic.com> wrote: >>>>>> @@ -187,6 +216,25 @@ >>>>>> * The machine ABI rules governing the format of all ring request and >>>>>> * response structures. >>>>>> * >>>>>> + * ring-page-order >>>>>> + * Values: <uint32_t> >>>>>> + * Default Value: 0 >>>>>> + * Maximum Value: MAX(ffs(max-ring-pages) - 1, max-ring-page-order) >>>>> >>>>> Why not just max-ring-page-order. After all, the value is meaningless >>>>> for a backend that only exposes max-ring-pages. >>>>> >>>>>> + * Notes: 1, 3 >>>>>> + * >>>>>> + * The size of the frontend allocated request ring buffer in units >>>>>> + * of lb(machine pages). (e.g. 0 == 1 page, 1 = 2 pages, 2 == 4 pages, >>>>>> + * etc.). >>>>>> + * >>>>>> + * ring-pages >>>>>> + * Values: <uint32_t> >>>>>> + * Default Value: 1 >>>>>> + * Maximum Value: MAX(max-ring-pages,(0x1 << max-ring-page-order)) >>>>> >>>>> Similarly here - just max-ring-pages should do. >>>> >>>> This patch documents existing extensions. For a new driver to properly negotiate a >>>> multi-page ring with a Dom0 on EC2 (ring-pages) or a Citrix Dom0/guest (ring-page-order), >>>> you must use the XenBus node names as I''ve defined them here. >>> >>> Can we pick one and mark it as preferred and the other deprecated or >>> similar? Perhaps backends will have to support both for the foreseeable >>> future but we should make it possible for frontends to just pick one if >>> that''s what they want while nudging them in the direction of all picking >>> the same one. >> >> History says that back-ends are updated more slowly than front-ends. For >> example, my fixes to QEMU''s serial port emulation that have been in Xen''s >> mainline for ~2 years are still not widely deployed on EC2. Folks running >> FreeBSD there are using an ugly hack to FreeBSD''s serial driver so they >> can get console output. >> >> So, if you want your front-ends to have optimal performance regardless of >> what cloud or appliance they are running on, the two types will have to be >> supported for some time. Neither is better than the other, just different. >> >> The opposite is true for back-ends. If your customer controls the guest >> environment and chooses not upgrade, you can''t kill support for the >> variant they use. > > I agree with all you say, the practicalities certainly mean we may well > be stuck with both forever. I don''t think there is any harm in > documenting one of them as preferred though. > > If someone has the freedom (or desire) to only implement one of the two > in their code base then that provides guidance as to which it should be, > at least then we will be encouraging the right trajectory. Also it gives > some small amount of ammunition to people who are stuck with a provider > (of either front or backend) of the wrong type for their environment and > helps break the stalemate as to who should fix their end (however > helpful that might be in practice).Anyone care to flip a coin? -- Justin
Ian Campbell
2012-Feb-15 13:32 UTC
Re: [PATCH 4 of 5] blkif.h: Document the RedHat and Citrix blkif multi-page ring extensions
On Wed, 2012-02-15 at 13:07 +0000, Justin T. Gibbs wrote:> On Feb 13, 2012, at 5:35 AM, Ian Campbell wrote: > > If someone has the freedom (or desire) to only implement one of the two > > in their code base then that provides guidance as to which it should be, > > at least then we will be encouraging the right trajectory. Also it gives > > some small amount of ammunition to people who are stuck with a provider > > (of either front or backend) of the wrong type for their environment and > > helps break the stalemate as to who should fix their end (however > > helpful that might be in practice). > > Anyone care to flip a coin?I think as the one who bothered to write things down you get that honour ;-) Ian.