Hello,
I no longer have the original message, so I''m going to reply in a
copy-paste of xen mailing list archive. Sorry for the inconvenience.
> During some discussions and handwaving, including discussions with
> some experts on the Xenserver/XCP storage architecture, we came up
> with what we think might be a plausible proposal for an architecture
> for communication between toolstack and driver domain, for storage at
> least.
>
> I offered to write it up. The abstract proposal is as I understand
> the consensus from our conversation. The concrete protocol is my own
> invention.
>
> Please comments. After a round of review here we should consider
> whether some of the assumptions need review from the communities
> involved in "other" backends (particularly, the BSDs).
>
> (FAOD the implementation of something like this is not 4.3 material,
> but it may inform some API decisions etc. we take in 4.2.)
>
> Ian.
>
>
> Components
>
> toolstack
>
> guest
> Might be the toolstack domain, or an (intended) guest vm.
>
> driver domain
> Responsible for providing the disk service to guests.
> Consists, internally, of (at least):
> control plane
> backend
> but we avoid exposing this internal implementation detail.
>
> We permit different driver domains on a single host, serving
> different guests or the same guests.
>
> The toolstack is expected to know the domid of the driver domain.
>
> driver domain kind
> We permit different "kinds" of driver domain, perhaps
implemented
> by completely different code, which support different facilities.
>
> Each driver domain kind needs to document what targets (see
> below) are valid and how they are specified, and what preparatory
> steps may need to be taken eg at system boot.
>
> Driver domain kinds do not have a formal presence in the API.
>
> Objects
>
> target
> A kind of name.
>
> Combination of a physical location and data format plus all other
> information needed by the underlying mechanisms, or relating to
> the data format, needed to access it.
>
> These names are assigned by the driver domain kind; the names may
> be an open class; no facility provided via this API to enumerate
> these.
>
> Syntactically, these are key/value pairs, mapping short string
> keys to shortish string values, suitable for storage in a
> xenstore directory.
>
> vdi
> This host''s intent to access a specific target.
> Non-persistent, created on request by toolstack, enumerable.
> Possible states: inactive/active.
> Abstract operations: prepare, activate, deactivate, unprepare.
>
> (We call the "create" operation for this object
"prepare" to
> avoid confusion with other kinds of "create".)
>
> The toolstack promises that no two vdis for the same target
> will simultaneously be active, even if the two vdis are on
> different hosts.
>
> vbd
> Provision of a facility for a guest to access a particular target
> via a particular vdi. There may be zero or more of these at any
> point for a particular vdi.
>
> Non-persistent, created on request by toolstack, enumerable.
> Abstract operations: plug, unplug.
>
> (We call the "create" operation for this object
"plug" to avoid
> confusion with other kinds of "create".)
>
> vbds may be created/destroyed, and the underlying vdi
> activated/deactivated, in any other. However IO is only possible
> to a vbd when the corresponding vdi is active. The reason for
> requiring activation as a separate step is to allow as much of
> the setup for an incoming migration domain''s storage to be
done
> before committing to the migration and entering the "domain is
> down" stage, during which access is switched from the old to the
> new host.
>
> We will consider here the case of a vbd which provides
> service as a Xen vbd backend. Other cases (eg, the driver domain
> is the same as the toolstack domain and the vbd provides a block
> device in the toolstack domain) can be regarded as
> optimisations/shortcuts.
>
> Concrete protocol
>
> The toolstack gives instructions to the driver domain, and receives
> results, via xenstore, in the path:
> /local/domain/<driverdomid>/backendctrl/vdi
> Both driver domain and toolstack have write access to the whole of
> this area.
>
> Each vdi which has been requested and/or exists, corresponds to a
> path .../backendctrl/vdi/<vdi> where <vdi> is a string (of
> alphanumerics, hyphens and underscores) chosen by the toolstack.
> Inside this, there are the following nodes:
>
> /local/domain/<driverdomid>/backendctrl/vdi/<vdi>/
> state The current state. Values are "inactive",
"active",
> or ENOENT meaning the vdi does not exist.
> Set by the driver domain in response to requests.
>
> request Operation requested by the toolstack and currently
> being performed. Created by the toolstack, but may
> then not be modified by the toolstack. Deleted
> by the driver domain when the operation has completed.
>
> The values of "request" are:
> prepare
> activate
> deactivate
> unprepare
> plug <vbd>
> unplug <vbd>
> <vbd> is an id chosen by the toolstack like
<vdi>
>
> result errno value (in decimal, Xen error number) best
> describing the results of the most recently completed
> operation; 0 means success. Created or set by the
> driver domain in the same transaction as it deletes
> request. The toolstack may delete this.
>
> result_msg Optional UTF-8 string explaining any error; does not
> exist when result is "0". Created or deleted by
the
> driver domain whenever the driver domain sets result.
> The toolstack may delete this.
>
> t/* The target name. Must be written by the toolstack.
> But may not be removed or changed while either of
> state or request exist.
>
> vbd/<vbd>/state
> The state of a vbd, "ok" or ENOENT.
> Set or deleted by the driver domain in response to
> requests.
>
> vbd/<vbd>/frontend
> The frontend path (complete path in xenstore) which the
> xen vbd should be servicing. Set by the toolstack
> with the plug request and not modified until after
> completion of unplug.
>
> vbd/<vbd>/backend
> The backend path (complete path in xenstore) which the
> driver domain has chosen for the vbd. Set by the
> driver domain in response to a plug request.
>
> vbd/<vbd>/b-copy/*
> The driver domain may request, in response to plug,
> that the toolstack copy these values to the specified
> backend directory, in the same transaction as it
> creates the frontend. Set by the driver domain in
> response to a plug request; may be deleted by the
> toolstack. DEPRECATED, see below.
>
> The operations:
>
> prepare
> Creates a vdi from a target.
> Preconditions:
> state ENOENT
> request ENOENT
> Request (xenstore writes by toolstack):
> request = "prepare"
> t/* as appropriate
> Results on success (xenstore writes by driver domain):
> request ENOENT } applies to success from all operations,
> result = "0" } will not be restated below
> state = "inactive"
> Results on error (applies to all operations): }
> request ENOENT } applies
> result = some decimal integer errno value } to all
> result_msg = ENOENT or a string } failures
>
> activate
> Preconditions:
> state = "inactive"
> request ENOENT
> Request:
> request = "activate"
> Results on success:
> state = "active"
>
> deactivate
> Preconditions:
> state = "active"
> request ENOENT
> Request:
> request = "deactivate"
> Results on success:
> state = "inactive"
>
> unprepare
> Preconditions:
> state != ENOENT
> request ENOENT
> Request:
> request = "unprepare"
> Results on success:
> state = ENOENT
>
> removal, modification, etc. of an unprepared vdi:
> Preconditions:
> state ENOENT
> request ENOENT
> Request:
> any changes to <vdi> directory which do
> not create "state" or "request"
> Results:
> ignored - no response from driver domain
>
> plug <vbd>
> Preconditions:
> state ENOENT
I''m not sure about this, but shouldn''t state =
"active" or at least
"prepared"? Maybe I don''t understant the protocol correctly,
but to be
able to plug a vbd, shouldn''t the underlying vdi be prepared first?
Also, as far as I understand, each vdi only has one vbd, why is the
<vbd> parameter needed in both the plug and unplug operations?
> request ENOENT
> vbd/<vbd>/state ENOENT
> <frontend> ENOENT
> Request:
> request = "plug <vbd>"
> vbd/<vbd>/frontend = <frontend>
("/local/domain/<guest>/...")
> Results on success:
> vbd/<vbd>/state = "ok"
> vbd/<vbd>/backend = <rel-backend>
> (<rel-backend> is the backend path relative to the
> driver domain''s home directory in xenstore)
> vbd/<vbd>/b-copy/* may be created } at least one of
these
> <backend>/* may come into existence } must happen
> Next step (xenstore write) by toolstack:
> <frontend> created and populated, specifically
> <frontend>/backend = <backend>
>
("/local/domain/<driverdomid>/<rel-backend>")
> <backend> created if necessary
> <backend>/* copied from vbd/<vbd>/b-copy/* if
any
> <backend>/frontend = <frontend> unless already set
>
> unplug <vbd>
> Preconditions:
> state ENOENT
> request ENOENT
> vbd/<vbd>/state "ok"
> Request:
> request = "unplug <vbd>"
> <frontend> ENOENT
> Results on success:
> vbd/<vbd>/state ENOENT
> <backend> ENOENT
So the flow of the procotol is (if everything return success):
connection: prepare -> activate -> plug
disconnection: unplug -> deactivate -> unprepare
>
> The toolstack and driver domains should not store state of their own,
> not required for these communication purposes, in the backendctrl/
> directory in xenstore. If the driver domain wishes to make records
> for its own use in xenstore, it should do so in a different directory
> of its choice (eg,
/local/domain/<driverdomid>/private/<something>.
>
>
> Notes regarding driver domains whose block backend implementation is
> controlled from the actual xenstore backend directory:
>
> The b-copy/* feature exists for compatibility with some of these. If
> such a backend cannot cope with the backend directory coming into
> existence before the corresponding frontend directory, then it is
> necessary to create and populate the backend in the same xenstore
> transaction as the creation of the frontend. However, such backends
> should be fixed; the b-copy/* feature is deprecated and will be
> withdrawn at some point.
>
> Note that a vbd may be created with the vdi inactive. In this case
So in this case, the connection may happen with:
connection: prepare -> plug -> activate?
I frankly find this vbd/vdi naming very confusing.
> the frontend and backend directories will exist, but the information
> needed to start up the backend properly may be lacking until the vdi
> is activated. For example, if the existence of a suitable block
> device in the driver domain depends on vdi activation, the block
> device id cannot be made known to the backend until after the backend
> directory has already been created and perhaps has existed for some
> time. It is believed that existing backends cope with this, because
> they use a "hotplug script" approach - where the backend
directory is
> created without specifying the device node, and this backend directory
> creation causes the invocation of machinery which establishes the
> device node, which is subsequently written to xenstore.
>
>
> Question
>
> What about network interfaces and other kinds of backend ?>