During some discussions and handwaving, including discussions with some experts on the Xenserver/XCP storage architecture, we came up with what we think might be a plausible proposal for an architecture for communication between toolstack and driver domain, for storage at least. I offered to write it up. The abstract proposal is as I understand the consensus from our conversation. The concrete protocol is my own invention. Please comments. After a round of review here we should consider whether some of the assumptions need review from the communities involved in "other" backends (particularly, the BSDs). (FAOD the implementation of something like this is not 4.3 material, but it may inform some API decisions etc. we take in 4.2.) Ian. Components toolstack guest Might be the toolstack domain, or an (intended) guest vm. driver domain Responsible for providing the disk service to guests. Consists, internally, of (at least): control plane backend but we avoid exposing this internal implementation detail. We permit different driver domains on a single host, serving different guests or the same guests. The toolstack is expected to know the domid of the driver domain. driver domain kind We permit different "kinds" of driver domain, perhaps implemented by completely different code, which support different facilities. Each driver domain kind needs to document what targets (see below) are valid and how they are specified, and what preparatory steps may need to be taken eg at system boot. Driver domain kinds do not have a formal presence in the API. Objects target A kind of name. Combination of a physical location and data format plus all other information needed by the underlying mechanisms, or relating to the data format, needed to access it. These names are assigned by the driver domain kind; the names may be an open class; no facility provided via this API to enumerate these. Syntactically, these are key/value pairs, mapping short string keys to shortish string values, suitable for storage in a xenstore directory. vdi This host''s intent to access a specific target. Non-persistent, created on request by toolstack, enumerable. Possible states: inactive/active. Abstract operations: prepare, activate, deactivate, unprepare. (We call the "create" operation for this object "prepare" to avoid confusion with other kinds of "create".) The toolstack promises that no two vdis for the same target will simultaneously be active, even if the two vdis are on different hosts. vbd Provision of a facility for a guest to access a particular target via a particular vdi. There may be zero or more of these at any point for a particular vdi. Non-persistent, created on request by toolstack, enumerable. Abstract operations: plug, unplug. (We call the "create" operation for this object "plug" to avoid confusion with other kinds of "create".) vbds may be created/destroyed, and the underlying vdi activated/deactivated, in any other. However IO is only possible to a vbd when the corresponding vdi is active. The reason for requiring activation as a separate step is to allow as much of the setup for an incoming migration domain''s storage to be done before committing to the migration and entering the "domain is down" stage, during which access is switched from the old to the new host. We will consider here the case of a vbd which provides service as a Xen vbd backend. Other cases (eg, the driver domain is the same as the toolstack domain and the vbd provides a block device in the toolstack domain) can be regarded as optimisations/shortcuts. Concrete protocol The toolstack gives instructions to the driver domain, and receives results, via xenstore, in the path: /local/domain/<driverdomid>/backendctrl/vdi Both driver domain and toolstack have write access to the whole of this area. Each vdi which has been requested and/or exists, corresponds to a path .../backendctrl/vdi/<vdi> where <vdi> is a string (of alphanumerics, hyphens and underscores) chosen by the toolstack. Inside this, there are the following nodes: /local/domain/<driverdomid>/backendctrl/vdi/<vdi>/ state The current state. Values are "inactive", "active", or ENOENT meaning the vdi does not exist. Set by the driver domain in response to requests. request Operation requested by the toolstack and currently being performed. Created by the toolstack, but may then not be modified by the toolstack. Deleted by the driver domain when the operation has completed. The values of "request" are: prepare activate deactivate unprepare plug <vbd> unplug <vbd> <vbd> is an id chosen by the toolstack like <vdi> result errno value (in decimal, Xen error number) best describing the results of the most recently completed operation; 0 means success. Created or set by the driver domain in the same transaction as it deletes request. The toolstack may delete this. result_msg Optional UTF-8 string explaining any error; does not exist when result is "0". Created or deleted by the driver domain whenever the driver domain sets result. The toolstack may delete this. t/* The target name. Must be written by the toolstack. But may not be removed or changed while either of state or request exist. vbd/<vbd>/state The state of a vbd, "ok" or ENOENT. Set or deleted by the driver domain in response to requests. vbd/<vbd>/frontend The frontend path (complete path in xenstore) which the xen vbd should be servicing. Set by the toolstack with the plug request and not modified until after completion of unplug. vbd/<vbd>/backend The backend path (complete path in xenstore) which the driver domain has chosen for the vbd. Set by the driver domain in response to a plug request. vbd/<vbd>/b-copy/* The driver domain may request, in response to plug, that the toolstack copy these values to the specified backend directory, in the same transaction as it creates the frontend. Set by the driver domain in response to a plug request; may be deleted by the toolstack. DEPRECATED, see below. The operations: prepare Creates a vdi from a target. Preconditions: state ENOENT request ENOENT Request (xenstore writes by toolstack): request = "prepare" t/* as appropriate Results on success (xenstore writes by driver domain): request ENOENT } applies to success from all operations, result = "0" } will not be restated below state = "inactive" Results on error (applies to all operations): } request ENOENT } applies result = some decimal integer errno value } to all result_msg = ENOENT or a string } failures activate Preconditions: state = "inactive" request ENOENT Request: request = "activate" Results on success: state = "active" deactivate Preconditions: state = "active" request ENOENT Request: request = "deactivate" Results on success: state = "inactive" unprepare Preconditions: state != ENOENT request ENOENT Request: request = "unprepare" Results on success: state = ENOENT removal, modification, etc. of an unprepared vdi: Preconditions: state ENOENT request ENOENT Request: any changes to <vdi> directory which do not create "state" or "request" Results: ignored - no response from driver domain plug <vbd> Preconditions: state ENOENT request ENOENT vbd/<vbd>/state ENOENT <frontend> ENOENT Request: request = "plug <vbd>" vbd/<vbd>/frontend = <frontend> ("/local/domain/<guest>/...") Results on success: vbd/<vbd>/state = "ok" vbd/<vbd>/backend = <rel-backend> (<rel-backend> is the backend path relative to the driver domain''s home directory in xenstore) vbd/<vbd>/b-copy/* may be created } at least one of these <backend>/* may come into existence } must happen Next step (xenstore write) by toolstack: <frontend> created and populated, specifically <frontend>/backend = <backend> ("/local/domain/<driverdomid>/<rel-backend>") <backend> created if necessary <backend>/* copied from vbd/<vbd>/b-copy/* if any <backend>/frontend = <frontend> unless already set unplug <vbd> Preconditions: state ENOENT request ENOENT vbd/<vbd>/state "ok" Request: request = "unplug <vbd>" <frontend> ENOENT Results on success: vbd/<vbd>/state ENOENT <backend> ENOENT The toolstack and driver domains should not store state of their own, not required for these communication purposes, in the backendctrl/ directory in xenstore. If the driver domain wishes to make records for its own use in xenstore, it should do so in a different directory of its choice (eg, /local/domain/<driverdomid>/private/<something>. Notes regarding driver domains whose block backend implementation is controlled from the actual xenstore backend directory: The b-copy/* feature exists for compatibility with some of these. If such a backend cannot cope with the backend directory coming into existence before the corresponding frontend directory, then it is necessary to create and populate the backend in the same xenstore transaction as the creation of the frontend. However, such backends should be fixed; the b-copy/* feature is deprecated and will be withdrawn at some point. Note that a vbd may be created with the vdi inactive. In this case the frontend and backend directories will exist, but the information needed to start up the backend properly may be lacking until the vdi is activated. For example, if the existence of a suitable block device in the driver domain depends on vdi activation, the block device id cannot be made known to the backend until after the backend directory has already been created and perhaps has existed for some time. It is believed that existing backends cope with this, because they use a "hotplug script" approach - where the backend directory is created without specifying the device node, and this backend directory creation causes the invocation of machinery which establishes the device node, which is subsequently written to xenstore. Question What about network interfaces and other kinds of backend ?
> -----Original Message----- > From: xen-devel-bounces@lists.xen.org [mailto:xen-devel- > bounces@lists.xen.org] On Behalf Of Ian Jackson > Sent: 04 April 2012 16:47 > To: xen-devel@lists.xen.org > Subject: [Xen-devel] Driver domains communication protocol proposal > > During some discussions and handwaving, including discussions with some > experts on the Xenserver/XCP storage architecture, we came up with what > we think might be a plausible proposal for an architecture for > communication between toolstack and driver domain, for storage at least. > > I offered to write it up. The abstract proposal is as I understand the > consensus from our conversation. The concrete protocol is my own > invention. > > Please comments. After a round of review here we should consider > whether some of the assumptions need review from the communities > involved in "other" backends (particularly, the BSDs). > > (FAOD the implementation of something like this is not 4.3 material, but it > may inform some API decisions etc. we take in 4.2.) >I''m wondering how we should deal with driver domain re-starts (possibly because of a crash). One of the compelling reasons for using driver domains is the ability to re-start them, possibly transparently to the frontend. If a driver domain were to crash, I guess it would be the responsibility of the tools to notice this and build a new one as quickly as possible. A frontend could notice the loss of a driver domain backend by, presumably a backend state watch firing followed by an inability to read the backend state key, as presumably a clean unplug should go through the usual closing->closed dance first. The frontend could then, perhaps, stall I/O while the tools build a new driver domain and re-build communications when it notices the <frontend>/backend key get updated by the tools. Does that sequence sound plausible? Paul
Paul Durrant writes ("RE: [Xen-devel] Driver domains communication protocol proposal"):> I''m wondering how we should deal with driver domain re-starts > (possibly because of a crash). One of the compelling reasons for > using driver domains is the ability to re-start them, possibly > transparently to the frontend.Right.> If a driver domain were to crash, I guess it would be the > responsibility of the tools to notice this and build a new one as > quickly as possible. A frontend could notice the loss of a driver > domain backend by, presumably a backend state watch firing followed > by an inability to read the backend state key,No, I don''t think anything would necessarily remove the backend from xenstore. So the frontend shouldn''t notice anything (other than a stall, obviously) until the <frontend>/backend node was updated to point to the replacement. Ian.
On Wed, Apr 4, 2012 at 4:46 PM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:> During some discussions and handwaving, including discussions with > some experts on the Xenserver/XCP storage architecture, we came up > with what we think might be a plausible proposal for an architecture > for communication between toolstack and driver domain, for storage at > least. > > I offered to write it up. The abstract proposal is as I understand > the consensus from our conversation. The concrete protocol is my own > invention. > > Please comments. After a round of review here we should consider > whether some of the assumptions need review from the communities > involved in "other" backends (particularly, the BSDs). > > (FAOD the implementation of something like this is not 4.3 material, > but it may inform some API decisions etc. we take in 4.2.) > > Ian. > > > Components > > toolstack > > guest > Might be the toolstack domain, or an (intended) guest vm. > > driver domain > Responsible for providing the disk service to guests. > Consists, internally, of (at least): > control plane > backend > but we avoid exposing this internal implementation detail. > > We permit different driver domains on a single host, serving > different guests or the same guests. > > The toolstack is expected to know the domid of the driver domain. > > driver domain kind > We permit different "kinds" of driver domain, perhaps implemented > by completely different code, which support different facilities. > > Each driver domain kind needs to document what targets (see > below) are valid and how they are specified, and what preparatory > steps may need to be taken eg at system boot. > > Driver domain kinds do not have a formal presence in the API. > > Objects > > target > A kind of name. > > Combination of a physical location and data format plus all other > information needed by the underlying mechanisms, or relating to > the data format, needed to access it. > > These names are assigned by the driver domain kind; the names may > be an open class; no facility provided via this API to enumerate > these. > > Syntactically, these are key/value pairs, mapping short string > keys to shortish string values, suitable for storage in a > xenstore directory. > > vdi > This host''s intent to access a specific target. > Non-persistent, created on request by toolstack, enumerable. > Possible states: inactive/active. > Abstract operations: prepare, activate, deactivate, unprepare.VDI as used by XenServer seems to mean "virtual disk instance", and as such is actually persistent. I don''t quite understand what it''s supposed to mean here, and how it differs from VBD (which in XenServer terminology means "virtual block device"). -George> > (We call the "create" operation for this object "prepare" to > avoid confusion with other kinds of "create".) > > The toolstack promises that no two vdis for the same target > will simultaneously be active, even if the two vdis are on > different hosts. > > vbd > Provision of a facility for a guest to access a particular target > via a particular vdi. There may be zero or more of these at any > point for a particular vdi. > > Non-persistent, created on request by toolstack, enumerable. > Abstract operations: plug, unplug. > > (We call the "create" operation for this object "plug" to avoid > confusion with other kinds of "create".) > > vbds may be created/destroyed, and the underlying vdi > activated/deactivated, in any other. However IO is only possible > to a vbd when the corresponding vdi is active. The reason for > requiring activation as a separate step is to allow as much of > the setup for an incoming migration domain''s storage to be done > before committing to the migration and entering the "domain is > down" stage, during which access is switched from the old to the > new host. > > We will consider here the case of a vbd which provides > service as a Xen vbd backend. Other cases (eg, the driver domain > is the same as the toolstack domain and the vbd provides a block > device in the toolstack domain) can be regarded as > optimisations/shortcuts. > > Concrete protocol > > The toolstack gives instructions to the driver domain, and receives > results, via xenstore, in the path: > /local/domain/<driverdomid>/backendctrl/vdi > Both driver domain and toolstack have write access to the whole of > this area. > > Each vdi which has been requested and/or exists, corresponds to a > path .../backendctrl/vdi/<vdi> where <vdi> is a string (of > alphanumerics, hyphens and underscores) chosen by the toolstack. > Inside this, there are the following nodes: > > /local/domain/<driverdomid>/backendctrl/vdi/<vdi>/ > state The current state. Values are "inactive", "active", > or ENOENT meaning the vdi does not exist. > Set by the driver domain in response to requests. > > request Operation requested by the toolstack and currently > being performed. Created by the toolstack, but may > then not be modified by the toolstack. Deleted > by the driver domain when the operation has completed. > > The values of "request" are: > prepare > activate > deactivate > unprepare > plug <vbd> > unplug <vbd> > <vbd> is an id chosen by the toolstack like <vdi> > > result errno value (in decimal, Xen error number) best > describing the results of the most recently completed > operation; 0 means success. Created or set by the > driver domain in the same transaction as it deletes > request. The toolstack may delete this. > > result_msg Optional UTF-8 string explaining any error; does not > exist when result is "0". Created or deleted by the > driver domain whenever the driver domain sets result. > The toolstack may delete this. > > t/* The target name. Must be written by the toolstack. > But may not be removed or changed while either of > state or request exist. > > vbd/<vbd>/state > The state of a vbd, "ok" or ENOENT. > Set or deleted by the driver domain in response to > requests. > > vbd/<vbd>/frontend > The frontend path (complete path in xenstore) which the > xen vbd should be servicing. Set by the toolstack > with the plug request and not modified until after > completion of unplug. > > vbd/<vbd>/backend > The backend path (complete path in xenstore) which the > driver domain has chosen for the vbd. Set by the > driver domain in response to a plug request. > > vbd/<vbd>/b-copy/* > The driver domain may request, in response to plug, > that the toolstack copy these values to the specified > backend directory, in the same transaction as it > creates the frontend. Set by the driver domain in > response to a plug request; may be deleted by the > toolstack. DEPRECATED, see below. > > The operations: > > prepare > Creates a vdi from a target. > Preconditions: > state ENOENT > request ENOENT > Request (xenstore writes by toolstack): > request = "prepare" > t/* as appropriate > Results on success (xenstore writes by driver domain): > request ENOENT } applies to success from all operations, > result = "0" } will not be restated below > state = "inactive" > Results on error (applies to all operations): } > request ENOENT } applies > result = some decimal integer errno value } to all > result_msg = ENOENT or a string } failures > > activate > Preconditions: > state = "inactive" > request ENOENT > Request: > request = "activate" > Results on success: > state = "active" > > deactivate > Preconditions: > state = "active" > request ENOENT > Request: > request = "deactivate" > Results on success: > state = "inactive" > > unprepare > Preconditions: > state != ENOENT > request ENOENT > Request: > request = "unprepare" > Results on success: > state = ENOENT > > removal, modification, etc. of an unprepared vdi: > Preconditions: > state ENOENT > request ENOENT > Request: > any changes to <vdi> directory which do > not create "state" or "request" > Results: > ignored - no response from driver domain > > plug <vbd> > Preconditions: > state ENOENT > request ENOENT > vbd/<vbd>/state ENOENT > <frontend> ENOENT > Request: > request = "plug <vbd>" > vbd/<vbd>/frontend = <frontend> ("/local/domain/<guest>/...") > Results on success: > vbd/<vbd>/state = "ok" > vbd/<vbd>/backend = <rel-backend> > (<rel-backend> is the backend path relative to the > driver domain''s home directory in xenstore) > vbd/<vbd>/b-copy/* may be created } at least one of these > <backend>/* may come into existence } must happen > Next step (xenstore write) by toolstack: > <frontend> created and populated, specifically > <frontend>/backend = <backend> > ("/local/domain/<driverdomid>/<rel-backend>") > <backend> created if necessary > <backend>/* copied from vbd/<vbd>/b-copy/* if any > <backend>/frontend = <frontend> unless already set > > unplug <vbd> > Preconditions: > state ENOENT > request ENOENT > vbd/<vbd>/state "ok" > Request: > request = "unplug <vbd>" > <frontend> ENOENT > Results on success: > vbd/<vbd>/state ENOENT > <backend> ENOENT > > The toolstack and driver domains should not store state of their own, > not required for these communication purposes, in the backendctrl/ > directory in xenstore. If the driver domain wishes to make records > for its own use in xenstore, it should do so in a different directory > of its choice (eg, /local/domain/<driverdomid>/private/<something>. > > > Notes regarding driver domains whose block backend implementation is > controlled from the actual xenstore backend directory: > > The b-copy/* feature exists for compatibility with some of these. If > such a backend cannot cope with the backend directory coming into > existence before the corresponding frontend directory, then it is > necessary to create and populate the backend in the same xenstore > transaction as the creation of the frontend. However, such backends > should be fixed; the b-copy/* feature is deprecated and will be > withdrawn at some point. > > Note that a vbd may be created with the vdi inactive. In this case > the frontend and backend directories will exist, but the information > needed to start up the backend properly may be lacking until the vdi > is activated. For example, if the existence of a suitable block > device in the driver domain depends on vdi activation, the block > device id cannot be made known to the backend until after the backend > directory has already been created and perhaps has existed for some > time. It is believed that existing backends cope with this, because > they use a "hotplug script" approach - where the backend directory is > created without specifying the device node, and this backend directory > creation causes the invocation of machinery which establishes the > device node, which is subsequently written to xenstore. > > > Question > > What about network interfaces and other kinds of backend ? > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel
George Dunlap writes ("Re: [Xen-devel] Driver domains communication protocol proposal"):> On Wed, Apr 4, 2012 at 4:46 PM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote: > > vdi > > This host''s intent to access a specific target. > > Non-persistent, created on request by toolstack, enumerable. > > Possible states: inactive/active. > > Abstract operations: prepare, activate, deactivate, unprepare. > > VDI as used by XenServer seems to mean "virtual disk instance", and as > such is actually persistent. I don''t quite understand what it''s > supposed to mean here, and how it differs from VBD (which in XenServer > terminology means "virtual block device").One "vdi" in this sense can support multiple "vbd"s. A "vbd" represents an attachment to a domain (or some other kind of provision for use) whereas a "vdi" is a preparatory thing. Feel free to suggest different terminology. Ian.
On Tue, 2012-04-24 at 19:00 +0100, Ian Jackson wrote:> George Dunlap writes ("Re: [Xen-devel] Driver domains communication protocol proposal"): > > On Wed, Apr 4, 2012 at 4:46 PM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote: > > > vdi > > > This host''s intent to access a specific target. > > > Non-persistent, created on request by toolstack, enumerable. > > > Possible states: inactive/active. > > > Abstract operations: prepare, activate, deactivate, unprepare. > > > > VDI as used by XenServer seems to mean "virtual disk instance", and as > > such is actually persistent. I don''t quite understand what it''s > > supposed to mean here, and how it differs from VBD (which in XenServer > > terminology means "virtual block device"). > > One "vdi" in this sense can support multiple "vbd"s. A "vbd" > represents an attachment to a domain (or some other kind of provision > for use) whereas a "vdi" is a preparatory thing. > > Feel free to suggest different terminology.What does the XCP SMAPI call these things? (Jon CCd) Ian.
On Tue, Apr 24, 2012 at 7:00 PM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote:> George Dunlap writes ("Re: [Xen-devel] Driver domains communication protocol proposal"): >> On Wed, Apr 4, 2012 at 4:46 PM, Ian Jackson <Ian.Jackson@eu.citrix.com> wrote: >> > vdi >> > This host''s intent to access a specific target. >> > Non-persistent, created on request by toolstack, enumerable. >> > Possible states: inactive/active. >> > Abstract operations: prepare, activate, deactivate, unprepare. >> >> VDI as used by XenServer seems to mean "virtual disk instance", and as >> such is actually persistent. I don''t quite understand what it''s >> supposed to mean here, and how it differs from VBD (which in XenServer >> terminology means "virtual block device"). > > One "vdi" in this sense can support multiple "vbd"s. A "vbd" > represents an attachment to a domain (or some other kind of provision > for use) whereas a "vdi" is a preparatory thing.Ah, so what you''re calling "vdi" in this case is a thing into which vbd''s can plug -- what we might call the backend "node" for a particular disk image? So we have: [A] <--> [B] <--> { [C], [D], [E] } Where: * A is the actual disk image on stable storage somewhere * B is the instance of the code that can access A and provide access to VMs which connect to it (not persistent) * C D and E are instances of code running inside the guest which connect to B and provide a block device to the guest OS which looks like A (again not persistent) Is that correct? I think calling A a "virtual disk image" makes the most sense; reusing that name for B is a bad idea given that it''s already used for A in XenServer terminology. (Jonathan, correct me if I''m wrong here.) I think that calling C D and E "vbd"s also makes sense. So we just need to have a good name for the running instance of a blockback process / thread / whatever that accesses a particular VDI. Virtual disk provider (VDP)? Block back instance (BBI)? Virtual block backend (VBB)? -George
George Dunlap writes ("Re: [Xen-devel] Driver domains communication protocol proposal"):> Ah, so what you''re calling "vdi" in this case is a thing into which > vbd''s can plug -- what we might call the backend "node" for a > particular disk image?Yes.> So we have: > > [A] <--> [B] <--> { [C], [D], [E] } > > Where: > * A is the actual disk image on stable storage somewhere > * B is the instance of the code that can access A and provide access > to VMs which connect to it (not persistent) > * C D and E are instances of code running inside the guest which > connect to B and provide a block device to the guest OS which looks > like A (again not persistent) > > Is that correct?Yes.> I think calling A a "virtual disk image" makes the most sense; reusing > that name for B is a bad idea given that it''s already used for A in > XenServer terminology. (Jonathan, correct me if I''m wrong here.)Right.> I think that calling C D and E "vbd"s also makes sense. > > So we just need to have a good name for the running instance of a > blockback process / thread / whatever that accesses a particular VDI. > Virtual disk provider (VDP)? Block back instance (BBI)? Virtual block > backend (VBB)?Anything with "backend" in it is probably wrong because in general C, D and E are backend/frontend pairs. The thing that B has that A (the vdi) hasn''t is that B has done all the preparatory work necessary for accessing the vdi apart from anything that involves exclusivity. "nonexclusive image context" aka "nic" ? :-) "nonexclusive image handle" aka "nih" :-) "preparatory exclusive (not) image session" ? Ian.
Apparently Analagous Threads
- Errors attaching VBDs to dom0 VM
- Bug#674088: xcp-xapi: vbd-plug to dom0 does not creates /dev/xvd* devices in dom0
- Bug#675055: xcp-xapi: xe-edit-bootloader does not compatible with new /dev/sm
- Bug#674161: xcp-xapi: 'the device disappeared from xenstore' message during vbd-plug (vm-start)
- XCP - Failed to parse the output of bootloader