I updated bug #515. If you want to suggest an approach for a solution, I''ll have a go at putting a patch together. Harry _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Mar 23, 2006 at 11:45:25AM +0000, Harry Butterworth wrote:> I updated bug #515. If you want to suggest an approach for a solution, > I''ll have a go at putting a patch together.Good catch, Harry! If you look at the block and xen-hotplug-cleanup scripts, you''ll see that they claim a lock to make sure that the teardown doesn''t interfere with the check when a device comes up. I think that a similar thing for vif teardown would suffice. I''m still trying to keep the parallelism in the scripts, where it''s safe to do so, because I don''t want to put things in the way of device bringup, but if it''s risky to do that, then a lock to serialise every vif operation, bringup and bringdown, ought to be safest. I''ll leave it up to you to decide how coarse-grained that locking ought to be. Cheers, Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, 2006-03-23 at 14:06 +0000, Ewan Mellor wrote:> On Thu, Mar 23, 2006 at 11:45:25AM +0000, Harry Butterworth wrote: > > > I updated bug #515. If you want to suggest an approach for a solution, > > I''ll have a go at putting a patch together. > > Good catch, Harry! > > If you look at the block and xen-hotplug-cleanup scripts, you''ll see > that they claim a lock to make sure that the teardown doesn''t interfere > with the check when a device comes up. I think that a similar thing for > vif teardown would suffice.I saw the block locking. The problem isn''t mutual exclusion of the vif-route and cleanup scripts it''s that the cleanup script must be serialised _after_ the vif-route script. I think the kernel can reorder the event injection into user space which is why udevsend uses sequence numbers to put events back in-order. So, in the case where the cleanup event is injected first and gets the lock first, the problem will still occur. With no ordering guarantee on the event transport between kernel and user space, the options I could think of were: 1) Use sequence numbers and reserialise events as is done by udevsend---the easiest way to do this is to drop support for hotplug and require udev. 2) Change the protocol such that the kernel code waits for the offline script to complete before issuing the cleanup event. This would require a state change in the backend when the offline script completes which would trigger the backend to unregister the xenbus device. I''m not sure that I understand the full implications of this. 3) Somehow combine the offline and cleanup into one event. I don''t know exactly how. 4) Use the lock even though we think the design is flawed and hope it will work most of the time. Any better ideas? Harry _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
On Thu, Mar 23, 2006 at 02:32:34PM +0000, Harry Butterworth wrote:> On Thu, 2006-03-23 at 14:06 +0000, Ewan Mellor wrote: > > On Thu, Mar 23, 2006 at 11:45:25AM +0000, Harry Butterworth wrote: > > > > > I updated bug #515. If you want to suggest an approach for a solution, > > > I''ll have a go at putting a patch together. > > > > Good catch, Harry! > > > > If you look at the block and xen-hotplug-cleanup scripts, you''ll see > > that they claim a lock to make sure that the teardown doesn''t interfere > > with the check when a device comes up. I think that a similar thing for > > vif teardown would suffice. > > I saw the block locking. > > The problem isn''t mutual exclusion of the vif-route and cleanup scripts > it''s that the cleanup script must be serialised _after_ the vif-route > script. > > I think the kernel can reorder the event injection into user space which > is why udevsend uses sequence numbers to put events back in-order. > > So, in the case where the cleanup event is injected first and gets the > lock first, the problem will still occur. > > With no ordering guarantee on the event transport between kernel and > user space, the options I could think of were: > > 1) Use sequence numbers and reserialise events as is done by > udevsend---the easiest way to do this is to drop support for hotplug and > require udev. > > 2) Change the protocol such that the kernel code waits for the offline > script to complete before issuing the cleanup event. This would require > a state change in the backend when the offline script completes which > would trigger the backend to unregister the xenbus device. I''m not sure > that I understand the full implications of this. > > 3) Somehow combine the offline and cleanup into one event. I don''t know > exactly how. > > 4) Use the lock even though we think the design is flawed and hope it > will work most of the time.Yes, well I was kind of hoping we''d get away with 4) ! Especially since the move to 2.6.16 and onwards might move people towards udev in any case. I''m not sure number 3 would work, so I''m inclined to avoid that. Number 2 sounds like the best option if you really want to do this properly -- register a watch on "vif/<domid>/<devid>/hotplug-teardown-status", issue the offline event, and then only call device_unregister once hotplug-teardown-status goes to "done". Cheers, Ewan. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel