harry
2005-Oct-24 14:46 UTC
[Xen-devel] States for correct establishment and tear-down of an inter-domain communication channel.
I''ve attached a diagram of the state machine I came up with. It''s fairly subtle and this is the fourth attempt so there''s a chance that this version is also somehow fundamentally flawed. In the end I went with a ''ready'' node and used the event-channel and ring-reference to represent the connected state. This is because the state machine doesn''t make use of the info until the other side gets to the connected state. The green states are the normal path, the red states are internal errors, the orange states are when we are disconnecting and the blue state is where we are going down to go around and reconnect again due to a reconnection of the remote side. The state machine would be simpler without the internal errors. Here are some comments about the state machine (cut and pasted from the implementation) to aid in understanding: /*****************************************************************************/ /* The state machine below makes the following assumptions: */ /* */ /* 1) The store might contain some stale cruft from the last time our device */ /* driver failed. This is a handy assumption for testing new versions of */ /* the driver but isn''t strictly necessary. */ /* */ /* 2) If the other side generates a protocol error on the inter-domain */ /* connection then we attempt to disconnect and reconnect. An alternative */ /* behaviour would be to wait for our interface to be called to disconnect */ /* and then reconnect before retrying. */ /* */ /* 3) If we experience an internal failure (fail to register watch for */ /* example) then we attempt to disconnect and wait for our interface to be */ /* called to disconnect (by module unload for example) and reconnect before */ /* retrying. */ /* */ /* 4) Connection and disconnection of the channel is done in two phases: the */ /* first phase makes the local resources available to the remote side, the */ /* second phase uses the resources of the remote side to complete the */ /* connection. */ /* */ /* The key for the stimuli is as follows: */ /* */ /* cn: interface called to connect the channel when interface state is */ /* disconnected (for example on module load). */ /* */ /* pe: protocol error detected when channel is in a state between phase two */ /* connected and the completion of the phase two disconnect callback. */ /* */ /* dn: interface called to disconnect the channel when the interface state */ /* is connected (for example on module unload). */ /* */ /* ou: a synchronous stimulus from the response test_other_state which */ /* indicates that the other state is still unknown because the watch */ /* callback hasn''t happened yet. Can only happen when making the response */ /* test_other_state. */ /* */ /* od: other state is disconnected. This is both a synchronous stimulus */ /* from test_other_state and an asynchronous stimulus from the watch */ /* function. Disconnected means that we can''t read the at least the other */ /* side''s ready node from the store. */ /* */ /* or: other state is ready. This is both a synchronous stimulus from */ /* test_other_state and an asynchronous stimulus from the watch function. */ /* Ready means that we can see the other side''s ready node but not the */ /* ring-reference and event-channel information. */ /* */ /* oc: other state is connected. This is both a synchronous stimulus from */ /* test_other_state and an asynchronous stimulus from the watch function. */ /* Connected means that we found both the ready node and the connected */ /* information in the store. */ /* */ /* If the values of the connected information change when the other side is */ /* connected then we generate the ''oc'' stimulus again which forces a */ /* reconnect. */ /* */ /* rs: An asynchronous response was successful (we only make one response at */ /* a time so all asynchronous responses have the same completion stimuli). */ /* */ /* rf: An asynchronous response failed. Only register_watch, clear_store, */ /* write_ready, write_connected, phase_two_connect can fail. Only */ /* phase_two_connect has a good reason for failure: the other side might */ /* have passed bogus parameters; the other failures are poor API design and */ /* ought to be promoted to domain failures. */ /* */ /* The state machine responses are as follows: */ /* */ /* test_other_state: what state do we currently think the other side is in */ /* as reflected by the last watch event. Synchronous (called with the lock */ /* held) completes with ou/od/or/oc. */ /* */ /* register_watch: register a watch on the other side. */ /* */ /* unregister_watch: unregister the watch. */ /* */ /* clear_store: remove the ready node and connected information. */ /* */ /* write_ready: write the ready node to the store. */ /* */ /* write_connected: write the connected information to the store. */ /* */ /* phase_one_connect: grant the remote side access to the local page etc. */ /* */ /* phase_two_connect: map the remote page etc. */ /* */ /* phase_two_disconnect: unmap the remote page. */ /* */ /* phase_one_disconnect: revoke the access of the remote side. */ /* */ /* complete_disconnect: When our interface is called to get us to disconnect */ /* the channel we quiesce and disconnect and then call this to indicate we */ /* are done. */ /* */ /*****************************************************************************/ Enjoy, Harry. On Mon, 2005-10-17 at 21:14 +1000, Rusty Russell wrote:> Hi Harry, > > Did some more thinking about state diagram. It is far simpler if (1) > we assume that failures all terminate the device (ie. wait for device > deletion), and (2) simply treat all changes the same, whether a resume > (backend change) or tools changing some configuration stuff. It''s not > complete, but useful to seeing what a nicer xenbus interface would look > like. I think something like the following: > > ->create() > // Allocate, read tool-written fields > // Xenbus watches backend > > ->open() > // Write fields for backend to read > // Xenbus ensures backend not connected anymore > > ->close() > // Abort connection to backend, remove fields for be > // Xenbus removes connected field > > ->change() > // Re-read tool-written fields. > > ->destroy() > // Deallocate. > > Grammar: > LIFE := create() CONN destroy() > CONN := change()* OPENCLOSE change()* > OPENCLOSE := open() change()* close() > > Anyway, here''s the first cut (haven''t sent out yet, since my brain is > still a little fried and I want to sleep on it). > > State Transition Diagram for Xen Skeleton Front End Device > > Events: > da: Device appears (tools create directory in store w/ initial fields incl. backend) > dd: Device destruction (tools remove directory from store) > dc: Device changes (tools alter fields in store) > db: Backend changes (restore) > rm: Module remove > > Unless otherwise referenced, failure puts into "fail" state, which can > only be resolved by destroying the device. > > i: Initial state > Device does not exist, only one event possible. > > da: goto i_da: read initial fields, watch backend > (be: backend info exists, bd: backend info doesn''t exist) > > i_da: > We need to make sure backend isn''t still connected so it > notices us coming up. Check if backend has "connected" node > (ce: "connected" exists, cd: "connected" doesn''t exist) > > ce: goto i_da: report error that other end still connected > cd: goto i_da_cd: create and write info for backend > db: goto i_da: move watch to new backend > rm: goto i_da_rm: unwatch backend, free resources > > i_da_cd: > Backend is not connected to old frontend, we can set up. > > dd: goto i: delete info for backend, unwatch backend, free resources > dc: goto i_da_cd: update fe info > db: goto i_da: move watch to new backend, delete info for backend > be: goto id_da_cd_be: read and store backend info, > write "connected" field > rm: goto i_da_rm: remove info for backend, unwatch backend, > free resources > > id_da_cd_be: > Fully connected. > > dd: goto i: unwatch backend, free resources > dc: goto i_da_cd_be: update fe info > db: goto i_da: abort connection, move watch to new backend, > remove info for backend > bd: goto i_da: remove "connected", remove info for backend, > abort connection > rm: goto i_da_rm: remove "connected", abort connection, remove info for backend, unwatch backend, free resources > > i_da_rm: > Remove module >_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
harry
2005-Oct-24 14:56 UTC
Re: [Xen-devel] States for correct establishment and tear-down of an inter-domain communication channel.
Sorry, I found a typo in the dot source I used to generate the postscript: the transition from i_cn to i_cn_rs should be labelled rs not dn. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel