It seems that xenstore watches do not work from domU userspace (they obviously do for drivers). In some preliminary debugging of this, there are two aspects of the failure: not sending the watch to xenstore and not handling watch events for userspace watches. The former can be easily fixed by adding the missing ''case XS_WATCH:'' to xenbus_dev_write() in xenbus_dev.c. The latter is more difficult to fix. So before I tackle it, I wanted to see if anyone else was already working on a solution or just had some thoughts to share on possible solutions. Joseph Cihula (Linux) Software Security Architect Open Source Technology Center Intel Corp. *** These opinions are not necessarily those of my employer *** _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Dec 15, 2005 at 03:33:08AM -0800, Cihula, Joseph wrote:> It seems that xenstore watches do not work from domU userspace (they > obviously do for drivers). > > In some preliminary debugging of this, there are two aspects of the > failure: not sending the watch to xenstore and not handling watch > events for userspace watches. The former can be easily fixed by adding > the missing ''case XS_WATCH:'' to xenbus_dev_write() in xenbus_dev.c. > The latter is more difficult to fix. So before I tackle it, I wanted to > see if anyone else was already working on a solution or just had some > thoughts to share on possible solutions.No-one''s working on this at the moment, as far as I am aware, and you are more than welcome to attack the problem. The problem is that the interface exposed through /proc/xen/xenbus does not handle the demultiplexing of watch events when they fire. It''s only simple bookkeeping that is missing: userspace processes will need to block on reading /proc/xen/xenbus, and then you need to o keep track of every register event so that you can demux the correct watches firing and unblock the correct reader; o tear down the state that you hold internally whenever someone sends an unregister message; o if the connection to userspace is closed before the watch is unregistered, unregister it on their behalf; o deliver watches to the kernel correctly too. The interface presented through /proc/xen/xenbus should be the same as that presented through the unix domain socket used by dom0, so you should be able to borrow lots of code from the existing xenstore library and move that into kernel space. Cheers, Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cihula, Joseph
2005-Dec-31 05:38 UTC
RE: [Xen-devel] watches not working from domU userspace
On Thursday, December 22, 2005 7:34 AM, Ewan Mellor <mailto:ewan@xensource.com> wrote:> On Thu, Dec 15, 2005 at 03:33:08AM -0800, Cihula, Joseph wrote: > >> It seems that xenstore watches do not work from domU userspace (they >> obviously do for drivers). >> >> In some preliminary debugging of this, there are two aspects of the >> failure: not sending the watch to xenstore and not handling watch >> events for userspace watches. The former can be easily fixed by >> adding the missing ''case XS_WATCH:'' to xenbus_dev_write() in >> xenbus_dev.c. The latter is more difficult to fix. So before I >> tackle it, I wanted to see if anyone else was already working on a >> solution or just had some thoughts to share on possible solutions. > > No-one''s working on this at the moment, as far as I am aware, and you > are more > than welcome to attack the problem. The problem is that the > interface exposed > through /proc/xen/xenbus does not handle the demultiplexing of watch > events > when they fire. It''s only simple bookkeeping that is missing: > userspace > processes will need to block on reading /proc/xen/xenbus, and then > you need to > > o keep track of every register event so that you can demux the > correct > watches firing and unblock the correct reader; > o tear down the state that you hold internally whenever someone > sends an > unregister message; > o if the connection to userspace is closed before the watch is > unregistered, unregister it on their behalf; > o deliver watches to the kernel correctly too. > > The interface presented through /proc/xen/xenbus should be the same > as that > presented through the unix domain socket used by dom0, so you should > be able > to borrow lots of code from the existing xenstore library and move > that into > kernel space. > > Cheers, > > Ewan.This is the approach that I''ve taken so far (I had to pause my work to finish up another project). However, when the watch bookkeeping is combined with the transaction bookkeeping (which is already there) this is a lot of duplication with what xenstored already has to do for each connection. I''m wondering if it wouldn''t be better to add a sub-connection id to xenstored and xenbus so that each xenbus client would be seen by xenstored as a separate (sub)connection. Then watches and transactions would not need any special code in xenbus, as xenstored would handle the tracking and cleanup. This method would either need to change the xenstore wire protocol or co-opt the req_id field (which doesn''t seem to be used). While some additional code would have to be added to xenstored, I think that a fair amount of code (and many synchronization locks/mutexes) in xenbus could be removed. What do you think? Joseph Cihula (Linux) Software Security Architect Open Source Technology Center Intel Corp. *** These opinions are not necessarily those of my employer *** _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 31 Dec 2005, at 05:38, Cihula, Joseph wrote:> However, when the watch bookkeeping is combined with the transaction > bookkeeping (which is already there) this is a lot of duplication with > what xenstored already has to do for each connection. I''m wondering if > it wouldn''t be better to add a sub-connection id to xenstored and > xenbus > so that each xenbus client would be seen by xenstored as a separate > (sub)connection. Then watches and transactions would not need any > special code in xenbus, as xenstored would handle the tracking and > cleanup. > > This method would either need to change the xenstore wire protocol or > co-opt the req_id field (which doesn''t seem to be used). While some > additional code would have to be added to xenstored, I think that a > fair > amount of code (and many synchronization locks/mutexes) in xenbus could > be removed.How about add a new command XS_NEW_CONNECTION, taking a grant reference and an event channel? This would then set up a new inter-domain event channel and messaging memory page, and then commands would no longer go via the kernel at all. One drawback is it would then be very hard for the kernel to tell whether any xenstore transactions are in progress in user space. This is a problem since transaction state is not copied during save/restore. Either that needs to happen, or transactional xenstore code needs to handle ''spurious'' failures at any point in a transaction (handle EAGAIN on any transactional read/write, not just on commit). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cihula, Joseph
2006-Jan-02 11:24 UTC
RE: [Xen-devel] watches not working from domU userspace
On Saturday, December 31, 2005 1:53 AM, Keir Fraser wrote:> On 31 Dec 2005, at 05:38, Cihula, Joseph wrote: > >> However, when the watch bookkeeping is combined with the transaction >> bookkeeping (which is already there) this is a lot of duplication >> with what xenstored already has to do for each connection. I''m >> wondering if it wouldn''t be better to add a sub-connection id to >> xenstored and xenbus so that each xenbus client would be seen by >> xenstored as a separate (sub)connection. Then watches and >> transactions would not need any special code in xenbus, as xenstored >> would handle the tracking and cleanup. >> >> This method would either need to change the xenstore wire protocol or >> co-opt the req_id field (which doesn''t seem to be used). While some >> additional code would have to be added to xenstored, I think that a >> fair amount of code (and many synchronization locks/mutexes) in >> xenbus could be removed. > > How about add a new command XS_NEW_CONNECTION, taking a grant > reference > and an event channel? This would then set up a new inter-domain event > channel and messaging memory page, and then commands would no longer > go via the kernel at all. > > One drawback is it would then be very hard for the kernel to tell > whether any xenstore transactions are in progress in user space. This > is a problem since transaction state is not copied during > save/restore. > Either that needs to happen, or transactional xenstore code needs to > handle ''spurious'' failures at any point in a transaction (handle > EAGAIN on any transactional read/write, not just on commit). > > -- KeirIt seems like this would add a lot of overhead. Xenstore commands are typically only a couple hundred bytes and since they are synchronous (per client and excluding watches), this would leave a lot of the page unused. Moreover, for transient clients, such as xenstore-{read, write, rm}, the extra setup/teardown would also add to the cost. Joe _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 2 Jan 2006, at 11:24, Cihula, Joseph wrote:> It seems like this would add a lot of overhead. Xenstore commands are > typically only a couple hundred bytes and since they are synchronous > (per client and excluding watches), this would leave a lot of the page > unused. Moreover, for transient clients, such as xenstore-{read, > write, > rm}, the extra setup/teardown would also add to the cost.Those overheads will certainly be insignificant. Far more important concerns are a clean design and small (as possible) implementation, with as little code/logic duplication as possible. Implementing a proxy xenstored in the kernel for the use of userland code is not a nice solution. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Sat, Dec 31, 2005 at 09:52:49AM +0000, Keir Fraser wrote:> > On 31 Dec 2005, at 05:38, Cihula, Joseph wrote: > > >However, when the watch bookkeeping is combined with the transaction > >bookkeeping (which is already there) this is a lot of duplication with > >what xenstored already has to do for each connection. I''m wondering if > >it wouldn''t be better to add a sub-connection id to xenstored and > >xenbus > >so that each xenbus client would be seen by xenstored as a separate > >(sub)connection. Then watches and transactions would not need any > >special code in xenbus, as xenstored would handle the tracking and > >cleanup. > > > >This method would either need to change the xenstore wire protocol or > >co-opt the req_id field (which doesn''t seem to be used). While some > >additional code would have to be added to xenstored, I think that a > >fair > >amount of code (and many synchronization locks/mutexes) in xenbus could > >be removed. > > How about add a new command XS_NEW_CONNECTION, taking a grant reference > and an event channel? This would then set up a new inter-domain event > channel and messaging memory page, and then commands would no longer go > via the kernel at all. > > One drawback is it would then be very hard for the kernel to tell > whether any xenstore transactions are in progress in user space. This > is a problem since transaction state is not copied during save/restore. > Either that needs to happen, or transactional xenstore code needs to > handle ''spurious'' failures at any point in a transaction (handle EAGAIN > on any transactional read/write, not just on commit).Could Xenstored not simply "save up" the EAGAIN until the commit? If a domain migrated in the middle of a transaction, then Xenstored would see a transaction handle that was invalid for that connection. In that case, it could simply ignore all writes until a commit, in which case it would return EAGAIN, just as if the transaction had seen a conflicting write. I think that forcing clients to handle EAGAIN for all messages, not just commit, would be undesirable. Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 3 Jan 2006, at 18:12, Ewan Mellor wrote:> Could Xenstored not simply "save up" the EAGAIN until the commit? If > a domain > migrated in the middle of a transaction, then Xenstored would see a > transaction handle that was invalid for that connection. In that > case, it > could simply ignore all writes until a commit, in which case it would > return > EAGAIN, just as if the transaction had seen a conflicting write. > > I think that forcing clients to handle EAGAIN for all messages, not > just > commit, would be undesirable.The alternative is to allow the restored client to see inconsistent data. Currently we ensure that a transaction sees a consistent snapshot of the store by copying the database file at the start of a transaction. If we do not copy that snapshot as part of save/restore (we don;t currently) then in-progress transactions in a restored guest cannot be guaranteed to see data consistent with their snapshot in later reads. Seems to me that is likely to have more subtle issues than simply returning EAGAIN and requiring the client to just have to deal with it. If the client has to be robust against seeing weird inconsistent data then it is likely to have a bail-and-retry path for many transactional reads anyway. May as well make the issue explicit. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Cihula, Joseph
2006-Jan-07 01:54 UTC
RE: [Xen-devel] watches not working from domU userspace
On Tuesday, January 03, 2006 10:13 AM, Ewan Mellor <> wrote:> On Sat, Dec 31, 2005 at 09:52:49AM +0000, Keir Fraser wrote: > >> How about add a new command XS_NEW_CONNECTION, taking a grant >> reference and an event channel? This would then set up a new >> inter-domain event channel and messaging memory page, and then >> commands would no longer go via the kernel at all.This is a bigger change than I was expecting, but sometimes (often ;-) doing the right thing is harder than doing the easy thing. I''ll start working on this approach. This also facilitates a useful follow-on change: modify XS_INTRODUCE to take a grant reference instead of an mfn and use that as the command (instead of XS_NEW_CONNECTION) to establish communication with the userspace clients. This would have the additional benefit of not requiring xenstored''s domain to be privileged just to map pages for its clients. Of course, it would mean altering the domain builder code to create a grant reference for the xenbus page. On Saturday, December 31, 2005 1:53 AM, Keir Fraser wrote:> One drawback is it would then be very hard for the kernel to tell > whether any xenstore transactions are in progress in user space. This > is a problem since transaction state is not copied during > save/restore. > Either that needs to happen, or transactional xenstore code needs to > handle ''spurious'' failures at any point in a transaction (handle > EAGAIN on any transactional read/write, not just on commit).On Wednesday, January 04, 2006 2:38 AM, Keir Fraser wrote:> On 3 Jan 2006, at 18:12, Ewan Mellor wrote: > >> Could Xenstored not simply "save up" the EAGAIN until the commit? >> If a domain migrated in the middle of a transaction, then Xenstored >> would see a transaction handle that was invalid for that connection. >> In that case, it could simply ignore all writes until a commit, in >> which case it would return EAGAIN, just as if the transaction had >> seen a conflicting write. >> >> I think that forcing clients to handle EAGAIN for all messages, not >> just commit, would be undesirable. > > The alternative is to allow the restored client to see inconsistent > data. Currently we ensure that a transaction sees a consistent > snapshot of the store by copying the database file at the start of a > transaction. If we do not copy that snapshot as part of save/restore > (we don''t currently) then in-progress transactions in a restored guest > cannot be guaranteed to see data consistent with their snapshot in > later reads. > > Seems to me that is likely to have more subtle issues than simply > returning EAGAIN and requiring the client to just have to deal with > it. If the client has to be robust against seeing weird inconsistentdata> then it is likely to have a bail-and-retry path for many transactional > reads anyway. May as well make the issue explicit.I think that I''ll go with Keir''s approach for the initial patch, since it will be easy to detect the invalid transaction handle and just fail all successive calls. I suspect that it would be both complex and problematic to try to return meaningful data to a resumed client, especially if you''re just going to fail the transaction anyway (writes could be dropped but reads would be an issue). Improvements could always be added later. Joseph Cihula (Linux) Software Security Architect Open Source Technology Center Intel Corp. *** These opinions are not necessarily those of my employer *** _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 7 Jan 2006, at 01:54, Cihula, Joseph wrote:> This is a bigger change than I was expecting, but sometimes (often ;-) > doing the right thing is harder than doing the easy thing. I''ll start > working on this approach. > > This also facilitates a useful follow-on change: modify XS_INTRODUCE > to > take a grant reference instead of an mfn and use that as the command > (instead of XS_NEW_CONNECTION) to establish communication with the > userspace clients. This would have the additional benefit of not > requiring xenstored''s domain to be privileged just to map pages for its > clients. Of course, it would mean altering the domain builder code to > create a grant reference for the xenbus page.We definitely want that change anyway (and also for the console page). We already reserve the first 8 entries in every grant table to allow us to implement this in future. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On 7 Jan 2006, at 10:06, Cihula, Joseph wrote:> The connection sud-id approach also has the option of keeping the same > suspend behavior since that only depends on snooping the communication > and not maintaining any state (other than the mutex). And if drivers > use the sub-ids, I think that more transaction and watch bookkeeping > could be removed from xenbus. And the grant table change to > XS_INTRODUCE that I mentioned works either way. > > Are there additional benefits to the grants approach besides just > removing xenbus from the communication path?Yes, I think maybe you are right. I missed that the connection id can conveniently be used to demux watch events. That given, I think piggybacking connections is the way to go after all. Reusing the request_id as a conn_id should work fine, since all current clients just set that field to zero. Do you intend to introduce new commands like XS_CREATE_CONNECTION / XS_DESTROY_CONNECTION? I think that those ought to be executable only by the root connection (connection 0). -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel