Jeremy Fitzhardinge
2010-Jul-27 00:36 UTC
[Xen-devel] xl create should refuse to share block devices RW between domains
When creating a domain, "xl create" should fail if a block device is shared RW between domains, like xm create does. I''m not sure how this would be implemented. Search xenstore for references to the device when setting up a domain? J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Jul-27 08:18 UTC
Re: [Xen-devel] xl create should refuse to share block devices RW between domains
On Tue, 2010-07-27 at 01:36 +0100, Jeremy Fitzhardinge wrote:> When creating a domain, "xl create" should fail if a block device is > shared RW between domains, like xm create does. > > I''m not sure how this would be implemented. Search xenstore for > references to the device when setting up a domain?The hotplug scripts have locking and calls to a function called "check_device_sharing" in them, I''ve been wondering why that wasn''t kicking in for xl created domains for a little while but never got to investigating. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2010-Jul-27 10:40 UTC
Re: [Xen-devel] xl create should refuse to share block devices RW between domains
On Tue, 27 Jul 2010, Ian Campbell wrote:> On Tue, 2010-07-27 at 01:36 +0100, Jeremy Fitzhardinge wrote: > > When creating a domain, "xl create" should fail if a block device is > > shared RW between domains, like xm create does. > > > > I''m not sure how this would be implemented. Search xenstore for > > references to the device when setting up a domain? > > The hotplug scripts have locking and calls to a function called > "check_device_sharing" in them, I''ve been wondering why that wasn''t > kicking in for xl created domains for a little while but never got to > investigating. >those scripts are called by udev and theoretically should work exactly the same way with xend or libxl. I didn''t test this but I believe that since libxl always uses blktap2, the script called is block and the codepath taken is the following: phys=$(xenstore_read_default "$XENBUS_PATH/physical-device" ''MISSING'') if [ "$phys" != ''MISSING'' ] then # Depending upon the hotplug configuration, it is possible for this # script to be called twice, so just bail. exit 0 fi so we never do any checks. I also think that tap_ctl_create is the right place to do these checks, not a script called by udev after the device has been created. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Jul-27 10:55 UTC
Re: [Xen-devel] xl create should refuse to share block devices RW between domains
On Tue, 2010-07-27 at 11:40 +0100, Stefano Stabellini wrote:> On Tue, 27 Jul 2010, Ian Campbell wrote: > > On Tue, 2010-07-27 at 01:36 +0100, Jeremy Fitzhardinge wrote: > > > When creating a domain, "xl create" should fail if a block device is > > > shared RW between domains, like xm create does. > > > > > > I''m not sure how this would be implemented. Search xenstore for > > > references to the device when setting up a domain? > > > > The hotplug scripts have locking and calls to a function called > > "check_device_sharing" in them, I''ve been wondering why that wasn''t > > kicking in for xl created domains for a little while but never got to > > investigating. > > > > those scripts are called by udev and theoretically should work > exactly the same way with xend or libxl. > I didn''t test this but I believe that since libxl always uses blktap2, > the script called is block and the codepath taken is the following: > > phys=$(xenstore_read_default "$XENBUS_PATH/physical-device" ''MISSING'') > if [ "$phys" != ''MISSING'' ] > then > # Depending upon the hotplug configuration, it is possible for this > # script to be called twice, so just bail. > exit 0 > fi > > so we never do any checks....and in any case physical-device would be a unique /dev/xen/tapdisk-N path, regardless of any sharing of the files tapdisk is backing onto so it is rather hard to check for sharing at this level anyway.> I also think that tap_ctl_create is the right place to do these checks, > not a script called by udev after the device has been created.Agreed. tapdisk should be taking out a flock() or something similar on any vhd files it is going to write to and should fail if it can''t lock the file. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Jul-27 15:17 UTC
[Xen-devel] Re: xl create should refuse to share block devices RW between domains
Jeremy Fitzhardinge writes ("xl create should refuse to share block devices RW between domains"):> When creating a domain, "xl create" should fail if a block device is > shared RW between domains, like xm create does. > > I''m not sure how this would be implemented. Search xenstore for > references to the device when setting up a domain?Does this safety catch have to be 100% reliable ? If not then there are some useful heuristics: for example, you could see whether the device is a devmapper device and if so check its open count. That would catch simultaneously mounting the fs in dom0. Doesn''t the kernel already have some features to try to stop multiple conflicting uses of the same block device ? Perhaps blktap[2] should do the same ? Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Jul-27 15:18 UTC
Re: [Xen-devel] xl create should refuse to share block devices RW between domains
On 07/27/2010 03:55 AM, Ian Campbell wrote:>> I also think that tap_ctl_create is the right place to do these checks, >> not a script called by udev after the device has been created. > Agreed. tapdisk should be taking out a flock() or something similar on > any vhd files it is going to write to and should fail if it can''t lock > the file.I''m not sure that would work for physical device/lvm backing. J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Jul-27 15:23 UTC
[Xen-devel] Re: xl create should refuse to share block devices RW between domains
On 07/27/2010 08:17 AM, Ian Jackson wrote:> Jeremy Fitzhardinge writes ("xl create should refuse to share block devices RW between domains"): >> When creating a domain, "xl create" should fail if a block device is >> shared RW between domains, like xm create does. >> >> I''m not sure how this would be implemented. Search xenstore for >> references to the device when setting up a domain? > Does this safety catch have to be 100% reliable ? If not then there > are some useful heuristics: for example, you could see whether the > device is a devmapper device and if so check its open count. That > would catch simultaneously mounting the fs in dom0.Well, my specific use case is that I have pairs of domain configs, one PV, one HVM, referring to the same set of resources. I want xl create to catch when I try and create the PV version of a domain while the HVM is still running. A more comprehensive check would be nice, but just this would be useful. But whatever it does check should be 100% reliable.> Doesn''t the kernel already have some features to try to stop multiple > conflicting uses of the same block device ? Perhaps blktap[2] should > do the same ?Not in general. The only interlock the kernel has is that you can''t change the partition mapping on a device while it is in use. The e2fsprogs do generally try to stop mistakes like fscking a mounted filesystem, but it fails to detect if the block device is being used by blktap (and that does make quite a bit of a mess, it turns out...). J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Jackson
2010-Jul-27 15:33 UTC
[Xen-devel] Re: xl create should refuse to share block devices RW between domains
Jeremy Fitzhardinge writes ("Re: xl create should refuse to share block devices RW between domains"):> Well, my specific use case is that I have pairs of domain configs, one > PV, one HVM, referring to the same set of resources. I want xl create > to catch when I try and create the PV version of a domain while the HVM > is still running.Mmm. Of course an HVM domain needs to open the underlying device twice, once for blktap and once for qemu.> A more comprehensive check would be nice, but just this would be > useful. But whatever it does check should be 100% reliable.Well, I guess I meant: 1. Do we have to catch every possible conflict ? If so then your e2fsprogs example is one we need to consider, and we will have to add a new kernel feature which can prevent e2fsprogs from opening the block device, or simulate "mounting" it or something. 1b. If not, then which conflicts are we trying to detect ? 2. If we catch a particular combination (eg, start two domains at once using the same storage resources) does our check have to be race-free ? That may make it more complicated - and if the answer to my first question is "no" there will be some things which are inherently racy (eg, spotting mounting a domain''s disk vs. starting a domain with a disk which is mounted). NB that when we fix the bug that you can start multiple domains with the same name, you''ll be able to avoid your PV/HVM accident by specifying the same name in each config file. Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Jul-27 15:47 UTC
Re: [Xen-devel] xl create should refuse to share block devices RW between domains
On Tue, 2010-07-27 at 16:18 +0100, Jeremy Fitzhardinge wrote:> On 07/27/2010 03:55 AM, Ian Campbell wrote: > >> I also think that tap_ctl_create is the right place to do these checks, > >> not a script called by udev after the device has been created. > > Agreed. tapdisk should be taking out a flock() or something similar on > > any vhd files it is going to write to and should fail if it can''t lock > > the file. > > I''m not sure that would work for physical device/lvm backing.That should be covered by the existing checks for shared devices in the hotplug scripts. I''m not suggesting those aren''t currently broken though ;-) Ian. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Ian Campbell
2010-Jul-27 15:50 UTC
Re: [Xen-devel] Re: xl create should refuse to share block devices RW between domains
I don''t have anything helpful add but just to mention that whatever solution is arrived at needs to work with localhost live migrate (which has historically been broken by any change in this area). Ian. On Tue, 2010-07-27 at 16:33 +0100, Ian Jackson wrote:> Jeremy Fitzhardinge writes ("Re: xl create should refuse to share block devices RW between domains"): > > Well, my specific use case is that I have pairs of domain configs, one > > PV, one HVM, referring to the same set of resources. I want xl create > > to catch when I try and create the PV version of a domain while the HVM > > is still running. > > Mmm. Of course an HVM domain needs to open the underlying device > twice, once for blktap and once for qemu. > > > A more comprehensive check would be nice, but just this would be > > useful. But whatever it does check should be 100% reliable. > > Well, I guess I meant: > > 1. Do we have to catch every possible conflict ? If so then > your e2fsprogs example is one we need to consider, and we > will have to add a new kernel feature which can prevent e2fsprogs > from opening the block device, or simulate "mounting" it or > something. > > 1b. If not, then which conflicts are we trying to detect ? > > 2. If we catch a particular combination (eg, start two domains at > once using the same storage resources) does our check have to be > race-free ? That may make it more complicated - and if the answer > to my first question is "no" there will be some things which are > inherently racy (eg, spotting mounting a domain''s disk > vs. starting a domain with a disk which is mounted). > > NB that when we fix the bug that you can start multiple domains with > the same name, you''ll be able to avoid your PV/HVM accident by > specifying the same name in each config file. > > Ian. > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel_______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Jeremy Fitzhardinge
2010-Jul-27 16:01 UTC
[Xen-devel] Re: xl create should refuse to share block devices RW between domains
On 07/27/2010 08:33 AM, Ian Jackson wrote:> Jeremy Fitzhardinge writes ("Re: xl create should refuse to share block devices RW between domains"): >> Well, my specific use case is that I have pairs of domain configs, one >> PV, one HVM, referring to the same set of resources. I want xl create >> to catch when I try and create the PV version of a domain while the HVM >> is still running. > Mmm. Of course an HVM domain needs to open the underlying device > twice, once for blktap and once for qemu.Good point. For stubdoms that''s pretty obvious if you consider the stubdom to be part of the main domain. For dom0-resident qemu it you need a more subtle check which can match a domain to its corresponding qemu.>> A more comprehensive check would be nice, but just this would be >> useful. But whatever it does check should be 100% reliable. > Well, I guess I meant: > > 1. Do we have to catch every possible conflict ? If so then > your e2fsprogs example is one we need to consider, and we > will have to add a new kernel feature which can prevent e2fsprogs > from opening the block device, or simulate "mounting" it or > something. > > 1b. If not, then which conflicts are we trying to detect ?I think domU vs domU conflicts are the most important, since they can come about from normal operation of the tools. dom0 vs domU requires someone doing something non-tools-related.> 2. If we catch a particular combination (eg, start two domains at > once using the same storage resources) does our check have to be > race-free ? That may make it more complicated - and if the answer > to my first question is "no" there will be some things which are > inherently racy (eg, spotting mounting a domain''s disk > vs. starting a domain with a disk which is mounted).Yes, it needs to be race-free. In general I think that specific invocations of "xl" should be considered atomic operations. For something like domain creation, it should be able to gather and reserve all the resources required before actually starting the domain, and if it can''t fail and release everything with a useful message. A test which can fail in certain racy circumstances is useless.> NB that when we fix the bug that you can start multiple domains with > the same name, you''ll be able to avoid your PV/HVM accident by > specifying the same name in each config file.BTW, that''s another bug. xl create doesn''t prevent two domains with the same name from being created... J _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Stefano Stabellini
2010-Jul-27 16:07 UTC
[Xen-devel] Re: xl create should refuse to share block devices RW between domains
On Tue, 27 Jul 2010, Jeremy Fitzhardinge wrote:> On 07/27/2010 08:33 AM, Ian Jackson wrote: > > Jeremy Fitzhardinge writes ("Re: xl create should refuse to share block devices RW between domains"): > >> Well, my specific use case is that I have pairs of domain configs, one > >> PV, one HVM, referring to the same set of resources. I want xl create > >> to catch when I try and create the PV version of a domain while the HVM > >> is still running. > > Mmm. Of course an HVM domain needs to open the underlying device > > twice, once for blktap and once for qemu. > > Good point. For stubdoms that''s pretty obvious if you consider the > stubdom to be part of the main domain. For dom0-resident qemu it you > need a more subtle check which can match a domain to its corresponding qemu. > > >> A more comprehensive check would be nice, but just this would be > >> useful. But whatever it does check should be 100% reliable. > > Well, I guess I meant: > > > > 1. Do we have to catch every possible conflict ? If so then > > your e2fsprogs example is one we need to consider, and we > > will have to add a new kernel feature which can prevent e2fsprogs > > from opening the block device, or simulate "mounting" it or > > something. > > > > 1b. If not, then which conflicts are we trying to detect ? > > I think domU vs domU conflicts are the most important, since they can > come about from normal operation of the tools. dom0 vs domU requires > someone doing something non-tools-related. > > > 2. If we catch a particular combination (eg, start two domains at > > once using the same storage resources) does our check have to be > > race-free ? That may make it more complicated - and if the answer > > to my first question is "no" there will be some things which are > > inherently racy (eg, spotting mounting a domain''s disk > > vs. starting a domain with a disk which is mounted). > > Yes, it needs to be race-free. In general I think that specific > invocations of "xl" should be considered atomic operations. For > something like domain creation, it should be able to gather and reserve > all the resources required before actually starting the domain, and if > it can''t fail and release everything with a useful message. A test > which can fail in certain racy circumstances is useless.I think the checks should be in tap_ctl_create. _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel
Daniel Stodden
2010-Jul-27 18:17 UTC
Re: [Xen-devel] xl create should refuse to share block devices RW between domains
On Tue, 2010-07-27 at 06:55 -0400, Ian Campbell wrote:> On Tue, 2010-07-27 at 11:40 +0100, Stefano Stabellini wrote: > > On Tue, 27 Jul 2010, Ian Campbell wrote: > > > On Tue, 2010-07-27 at 01:36 +0100, Jeremy Fitzhardinge wrote: > > > > When creating a domain, "xl create" should fail if a block device is > > > > shared RW between domains, like xm create does. > > > > > > > > I''m not sure how this would be implemented. Search xenstore for > > > > references to the device when setting up a domain? > > > > > > The hotplug scripts have locking and calls to a function called > > > "check_device_sharing" in them, I''ve been wondering why that wasn''t > > > kicking in for xl created domains for a little while but never got to > > > investigating. > > > > > > > those scripts are called by udev and theoretically should work > > exactly the same way with xend or libxl. > > I didn''t test this but I believe that since libxl always uses blktap2, > > the script called is block and the codepath taken is the following: > > > > phys=$(xenstore_read_default "$XENBUS_PATH/physical-device" ''MISSING'') > > if [ "$phys" != ''MISSING'' ] > > then > > # Depending upon the hotplug configuration, it is possible for this > > # script to be called twice, so just bail. > > exit 0 > > fi > > > > so we never do any checks. > > ...and in any case physical-device would be a unique /dev/xen/tapdisk-N > path, regardless of any sharing of the files tapdisk is backing onto so > it is rather hard to check for sharing at this level anyway.Nope. The xl/tapctl code does a reverse map from the leaf type:/path to the minor number, so the result would be a shared device node. Provided the same physical node isn''t accessed through some alias. But think at some point in the not so far future we''ll start resolving (fs;node) pairs anyway, to better identify storage types. Also provided that xl create won''t race. The tap-ctl calls don''t serialize themselves. All that is different from blktap1 altogether, where sharing e.g. a vhd is causing disaster in any case. I guess xl/xend doesn''t have a config bit for sharing disks? On a single host it''s altogether possible, although not exactly popular. On shared storage the locking doesn''t buy you anything, that''s part of the reason why nobody ever cared to implement it. XCP used to work with killing metadata headers in VHD for a little while. Exactly until people got aware of the crash resilience issues involved :o)> > I also think that tap_ctl_create is the right place to do these checks, > > not a script called by udev after the device has been created. > > Agreed. tapdisk should be taking out a flock() or something similar on > any vhd files it is going to write to and should fail if it can''t lock > the file.Yes. If so, it would have to be tapdisk doing extra checks during tap-ctl open, not tap-ctl. For the above reasons, that only helps not breaking the metadata by running two tapdisk on non-sharable image containers. The guest fs is still at risk. So it''s either hotplug. I optimistically believe a lot in userspace problem solving. Or, if paranoia prevails, it could be relatively safely done at a lower level in blkback. Daniel _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel