David Edmondson
2007-Feb-26 16:24 UTC
[Xen-devel] [PATCH] Require that xenstored writes to a domain complete in a single chunk
If xenstored is part-way through writing a header+payload into the buffer shared with a guest domain when the guest domain decides to suspend, the buffer is corrupted, as xenstored doesn''t know that it has a partial write to complete when the domain revives. The domain is expecting proper completion of the partial header+payload and is disappointed. The attached patch modifies xenstored such that it checks for sufficient space for header+payload before making any changes to the shared buffer. It is against 3.0.4-1, but the code in unstable looks the same. dme. -- David Edmondson, Sun Microsystems, http://www.dme.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Feb-26 17:20 UTC
Re: [Xen-devel] [PATCH] Require that xenstored writes to a domain complete in a single chunk
On 26/2/07 16:24, "David Edmondson" <dme@sun.com> wrote:> If xenstored is part-way through writing a header+payload into the > buffer shared with a guest domain when the guest domain decides to > suspend, the buffer is corrupted, as xenstored doesn''t know that it > has a partial write to complete when the domain revives. The domain > is expecting proper completion of the partial header+payload and is > disappointed. > > The attached patch modifies xenstored such that it checks for > sufficient space for header+payload before making any changes to the > shared buffer. > > It is against 3.0.4-1, but the code in unstable looks the same.This seems dubious. There''s no reason we might not have payloads bigger than the ring size (which is only 1kB). The right fix would be in the guest, which should already be stopping any transactions or commands across save/restore. Does this problem occur when xenstored sends an asynchronous watch-fired message? Probably the packet-reading thread should be interrupted and put to sleep before suspending. For older guest compatibility perhaps we can take a variant of your patch that only waits for enough space is the entire message fits in the ring in one go. This would be ''best-effort'' at compatibility while not precluding use of larger messages in general. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
David Edmondson
2007-Feb-26 17:48 UTC
[Xen-devel] Re: [PATCH] Require that xenstored writes to a domain complete in a single chunk
* keir@xensource.com [2007-02-26 17:20:34]> This seems dubious. There''s no reason we might not have payloads > bigger than the ring size (which is only 1kB).Agreed.> The right fix would be in the guest, which should already be > stopping any transactions or commands across save/restore. Does this > problem occur when xenstored sends an asynchronous watch-fired > message?All of the cases I examined (a few dozen) were for watch events.> Probably the packet-reading thread should be interrupted > and put to sleep before suspending.I''ll look at this.> For older guest compatibility perhaps we can take a variant of your > patch that only waits for enough space is the entire message fits in > the ring in one go. This would be ''best-effort'' at compatibility > while not precluding use of larger messages in general.Is the implication that you think that this problem could occur with a Linux guest (I''ve never seen it, though have tested much less)? dme. -- David Edmondson, Sun Microsystems, http://www.dme.org _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Feb-26 18:14 UTC
Re: [Xen-devel] Re: [PATCH] Require that xenstored writes to a domain complete in a single chunk
On 26/2/07 17:48, "David Edmondson" <dme@sun.com> wrote:>> For older guest compatibility perhaps we can take a variant of your >> patch that only waits for enough space is the entire message fits in >> the ring in one go. This would be ''best-effort'' at compatibility >> while not precluding use of larger messages in general. > > Is the implication that you think that this problem could occur with a > Linux guest (I''ve never seen it, though have tested much less)?The Linux suspend thread does not sync with the xenbus reader thread at all. I''m not sure why we''ve never seen any problems on Linux, but I guess it''s rare that a message cannot be sent all in one go. Especially a watch event, as those are usually fairly short. Oh..... Wait a minute! On Linux we explicitly tear down watches before suspend. Or at least we used to, before a patch of a few weeks ago (c/s 13519, Jan 19th 2007). This would save us because no watches registered -> no watches fire. Do you not have anything similar to this in your xenbus code (presumably you took the dual-licensed files as a basis for the Solaris implementation)? So Linux now needs fixing too, but the bug window here has been only a few weeks and no supported kernel releases include the bug. This being the case we should probably just fix this issue in current guest kernels. -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
John Levon
2007-Feb-26 19:09 UTC
Re: [Xen-devel] Re: [PATCH] Require that xenstored writes to a domain complete in a single chunk
On Mon, Feb 26, 2007 at 06:14:05PM +0000, Keir Fraser wrote:> The Linux suspend thread does not sync with the xenbus reader thread at all. > I''m not sure why we''ve never seen any problems on Linux, but I guess it''s > rare that a message cannot be sent all in one go. Especially a watch event, > as those are usually fairly short.Presumably we have to make sure we''ve waited for all threads waiting for a response to have woken up from the sleep-until-reply too, though; I''m not sure we do that. regards john _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Keir Fraser
2007-Feb-26 23:12 UTC
Re: [Xen-devel] Re: [PATCH] Require that xenstored writes to a domain complete in a single chunk
On 26/2/07 19:09, "John Levon" <levon@movementarian.org> wrote:>> The Linux suspend thread does not sync with the xenbus reader thread at all. >> I''m not sure why we''ve never seen any problems on Linux, but I guess it''s >> rare that a message cannot be sent all in one go. Especially a watch event, >> as those are usually fairly short. > > Presumably we have to make sure we''ve waited for all threads waiting for > a response to have woken up from the sleep-until-reply too, though; I''m > not sure we do that.Request-response pairs are serialised on a mutex. This mutex is locked out across save/restore, which means that save/restore cannot proceed until any outstanding response has been fully received. There''s nothing even halfway tricky going on here. :-) -- Keir _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel