Andy Smith
2007-Nov-24 18:37 UTC
[Pkg-xen-devel] Bug#452721: xen-utils-common: "xendomains" does not restore domains in same order as it would start them
Package: xen-utils-common Version: 3.0.3-0-2 Severity: wishlist The "xendomains" init script will start domains according to the order of config files found in /etc/xen/auto/*. I use this so that, in the event of a hard reboot, the more important domains will start first. Some of these contain essential services like DNS resolvers, slapd and so on, and since starting xen domains happens in series and can be quite time consuming, it is rather useful to have the around first before everything else starts up. Unfortunately though it seems that when restoring domains from a save file in /var/lib/xen/save/* the ordering is alphabetical. It would be great if restoring from savefile could be ordered in the same way as starting from cold. -- System Information: Debian Release: 4.0 APT prefers stable APT policy: (500, 'stable') Architecture: i386 (i686) Shell: /bin/sh linked to /bin/bash Kernel: Linux 2.6.18-4-xen-686 Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8) Versions of packages xen-utils-common depends on: ii lsb-base 3.1-23.2etch1 Linux Standard Base 3.1 init scrip ii udev 0.105-4 /dev/ and hotplug management daemo xen-utils-common recommends no packages. -- no debconf information
Elliott Mitchell
2019-Apr-02 02:35 UTC
[Pkg-xen-devel] Bug#452721: #452721 is kind of important
found 452721 4.8.5+shim4.10.2+xsa282-1+deb9u11 quit I'm inclined to suggest #452721 is actually a bit more than merely wishlist. The ordering of domain start/restore/stop/save can be extremely important. The current behavior of the xendomains init script is rather simplistic. I would argue for the use of tagging along the lines of what was standardized for init scripts. Domains acting as LDAP/NIS/syslog would need to start before fileserver domains. Mailserver domains would start after nearly all other domains had start. These would likely be stopped in /almost/ the reverse order (there could be reasons for stopping/suspending them in an unrelated order). Rather more interestingly, one might desire some domains to start in parallel with some services started by init scripts; yet others to start near the end of the init process. Perhaps modify /etc/init.d/xendomains to be called multiple times during system startup/shutdown? Maybe there could be links /etc/rc2.d/S02xendomains:early, /etc/rc2.d/S03xendomains:middle, and /etc/rc2.d/X04xendomains:late which start "early", "middle", and "late" domains? Pretty much it really needs to be redone from the ground up. Related, but perhaps a distinct issue is what happens when one runs `/etc/init.d/xendomains reload`. The current behavior is to pause/stop all domains and then resume/start all domains. One use for this is for when qemu-system-i386 gets security updates. In such case the desired behavior is to stop and then start each domain (which results in shorter downtimes for each domain, even though the whole process takes just as long). -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg at m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
I'm surprised #452721 is tagged moreinfo since it seems simple, but that may depend on installation capability. Note, I am not the original reporter, so I might actually be observing something distinct. I doubt this, but I cannot be certain. Issue is this, a hypervisor machine could have tens or even hundreds of VMs. There could be ordering dependencies during startup and shutdown. Notably there are core services, such as LDAP, DHCP, fileserver and DNS. Often these need to be up before anything else and they may need to come up in a particular order. Most often the LDAP server (which can be a distinct VM) needs to be up first. Meanwhile for downtimes, a fileserver (which can also be a VM) needs to go down last. During a full downtime when all VMs were fully shut down, this effect can be achieved by including numbers in the filename. Say /etc/xen/auto/0_ldap.cfg, /etc/xen/auto/1_fileserver.cfg, /etc/xen/auto/9_everything_else.cfg. If the hypervisor is rebooted and VMs are saved to /var/lib/xen/save; they will be paused in identifier order, but saved by domain name. When scanning /var/lib/xen/save, `xendomains` goes by filename which means VMs are restored in a distinct (and often problematic) order. A minimal solution would be for `xendomains` to save VMs in /var/lib/xen/save <domId>-<name> and then use `sort -n` during restore. A better approach would be to have a LSB style header specifying dependencies to flag VMs which should be saved or shutdown late, and VMs which should be saved or shutdown early. A ridiculous overkill solution might be to turn the /etc/xen/*.cfg files into full init scripts. This could be done by having a script which understood domain configuration files well enough to identify the name/UUID and then start/stop the domain as specified by $1. Use that script as the interpreter (#! line), then it could find the configuration via $0. Then normal init script handling tools could take care of ordering. (geeze, that really does actually seem kind of like a semi-workable solution despite seeming rather crazy at first) -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg at m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Andy Smith
2021-Sep-27 17:13 UTC
[Pkg-xen-devel] Bug#452721: Bug#452721: #452721 moreinfo?
Hi Elliott, On Sun, Sep 26, 2021 at 08:07:58PM -0700, Elliott Mitchell wrote:> During a full downtime when all VMs were fully shut down, this effect > can be achieved by including numbers in the filename. Say > /etc/xen/auto/0_ldap.cfg, /etc/xen/auto/1_fileserver.cfg, > /etc/xen/auto/9_everything_else.cfg.I also do this to control start up order, though I use a prefix of NNN-. The main missing functionality from my point of view is not being able to control the order of save/shutdown. As you say the script for saving everything or shutting everything down just does a read of all existing domids and does the action on them one by one in increasing order. I think the "auto" directory is a pretty good and simple interface, so how about using it for save/shutdown as well? So, instead of just enumerating all running domids, enumerate all files in /etc/xen/auto/ in REVERSE order, parsing the name of the domain out of each one and doing the action on that name. When all files have been exhausted, THEN do the action on any remaining running domains. This has the advantages of: - still working even if administrator does not use ordering in /etc/xen/auto. Filename format there does not change from what it is now, where ordering is already possible but is optional. - being quite obvious behaviour - save/shutdown order is reverse of start order. That seems like a good minimal improvement, but if one wanted to explicitly control save/shutdown order then perhaps the next enhancement could be an /etc/xen/shutdown/ directory with similar purpose to the "auto" one? i.e.: 1. Enumerate files in "shutdown" directory in reverse order, getting name from each and doing shutdown action on it 2. If there were no files there, instead use "auto" directory for this purpose 3. Then do shutdown action on every remaining running domain as usual Again this still results in everything getting a shutdown action if administrator does not want to do any of this. It's an open question for me whether step 2 (falling back to enumerating "auto" directory) only happens when "shutdown" directory is empty or if it should happen all of the time. If you had a dom0 with 100 domains on it but only wanted to control the order of a few of them, without fallback you would need to copy ALL the links from auto to shutdown and then change their ordering because otherwise this would shut down the ones you specified and then do all the rest in domid order like it does right now. WITH fallback, you'd get the few you wanted to control done in the order you expect and then you'd get the order from "auto", which is appealing but does mean it's going to try to shut down again some that are already shut down. If there is a relatively quick "is a domain by this name still running?" check then maybe that's workable.> If the hypervisor is rebooted and VMs are saved to /var/lib/xen/save; > they will be paused in identifier order, but saved by domain name. When > scanning /var/lib/xen/save, `xendomains` goes by filename which means VMs > are restored in a distinct (and often problematic) order. > > A minimal solution would be for `xendomains` to save VMs in > /var/lib/xen/save <domId>-<name> and then use `sort -n` during restore.If by this you mean it would be good if the "save all" action picked the filename from the filename in the "auto" directory, to replicate that directory's ordering, then I agree. If however you mean the actual Xen domid of the running domain then I'm not sure what that would buy us. If I had a domain with a filename of 010-ldap0.cfg it might get strted first and have domid 1, but then I reboot it and it has domid 99, I wouldn't want it saved as /var/lib/xen/save/99-ladp0, I'd still want it saved as /var/lib/xen/save/010-ladp0,> A better approach would be to have a LSB style header specifying > dependencies to flag VMs which should be saved or shutdown late, > and VMs which should be saved or shutdown early. > > A ridiculous overkill solution might be to turn the /etc/xen/*.cfg > files into full init scripts.I don't think that we should be proposing to change the config language of upstream Xen or diverge from how domains are usually configured with upstream Xen. I think that we can get a lot of improvement without modifying the format of the config files and only by changing how the start and shutdown scripts work. At the moment domain start and shutdown is serial in nature and can take a long time. I don't know if there is any scope for improving that in scripts, or whether it's an upstream conversation, either way not for this bug. But because of the lengthy process I do have an interest in starting my important domains first and shutting them down last. Presently I am handling this by numbering the links in the auto directory, and using my own script that saves or shuts things down in the order I want. I can see how this could be improved but I'm not sure it's worth spending a large amount of effort on it and/or coming up with a complicated solution. I have multiple dom0s so where I have concerns about an essential service being unavailable I take steps to make that service redundant and then I don't have to care so much about whether the domain for that service is shut down 1st or 100th. While being able to control ordering of shutdown would be NICE, it seems like this would be catering to the administrator of a single dom0 that can't make services redundant. This raises the question of what are such administrators doing about the risk of their one dom0 host becoming unavailable and all its domains with it? I also feel that trying to add dependency logic into the configuration is stepping into territory best left to actual cluster management software, that says what order things should start/stop in, how many copies of them need to run, where they can be allowed to run for redundancy purposes, etc. Thanks, Andy
Elliott Mitchell
2021-Sep-27 21:16 UTC
[Pkg-xen-devel] Bug#452721: Bug#452721: #452721 moreinfo?
On Mon, Sep 27, 2021 at 05:13:04PM +0000, Andy Smith wrote:> On Sun, Sep 26, 2021 at 08:07:58PM -0700, Elliott Mitchell wrote: > > During a full downtime when all VMs were fully shut down, this effect > > can be achieved by including numbers in the filename. Say > > /etc/xen/auto/0_ldap.cfg, /etc/xen/auto/1_fileserver.cfg, > > /etc/xen/auto/9_everything_else.cfg. > > I also do this to control start up order, though I use a prefix of > NNN-. > > The main missing functionality from my point of view is not being > able to control the order of save/shutdown. As you say the script > for saving everything or shutting everything down just does a read > of all existing domids and does the action on them one by one in > increasing order.Seems we're running into the same problems, coming up with the same first-tier workaround and now we all need a common complete solution.> I think the "auto" directory is a pretty good and simple interface, > so how about using it for save/shutdown as well? So, instead of just > enumerating all running domids, enumerate all files in > /etc/xen/auto/ in REVERSE order, parsing the name of the domain out > of each one and doing the action on that name. When all files have > been exhausted, THEN do the action on any remaining running domains. > > This has the advantages of: > > - still working even if administrator does not use ordering in > /etc/xen/auto. Filename format there does not change from what it > is now, where ordering is already possible but is optional. > > - being quite obvious behaviour - save/shutdown order is reverse of > start order.This though requires something which understands the format of those files, can retrieve name or uuid, and then resolve that to something suitable for `xl {save|shutdown}`. Alternatively this requires `xl {save|shutdown}` to be able to select the target domain based on the configuration file (documentation reads like this might be halfway implemented). Additionally this needs a tool to identify domains which are NOT listed in /etc/xen/auto/ then do save/shutdown on them first.> That seems like a good minimal improvement, but if one wanted to > explicitly control save/shutdown order then perhaps the next > enhancement could be an /etc/xen/shutdown/ directory with similar > purpose to the "auto" one? i.e.: > > 1. Enumerate files in "shutdown" directory in reverse order, getting > name from each and doing shutdown action on it > > 2. If there were no files there, instead use "auto" directory for > this purpose > > 3. Then do shutdown action on every remaining running domain as > usual > > Again this still results in everything getting a shutdown action if > administrator does not want to do any of this. > > It's an open question for me whether step 2 (falling back to > enumerating "auto" directory) only happens when "shutdown" directory > is empty or if it should happen all of the time.This strikes me (note, I am NOT a Debian maintainer) as likely to involve too much work for too little gain. For complex setups this won't be enough, for simple setups this will be overkill.> > If the hypervisor is rebooted and VMs are saved to /var/lib/xen/save; > > they will be paused in identifier order, but saved by domain name. When > > scanning /var/lib/xen/save, `xendomains` goes by filename which means VMs > > are restored in a distinct (and often problematic) order. > > > > A minimal solution would be for `xendomains` to save VMs in > > /var/lib/xen/save <domId>-<name> and then use `sort -n` during restore. > > If by this you mean it would be good if the "save all" action picked > the filename from the filename in the "auto" directory, to replicate > that directory's ordering, then I agree. > > If however you mean the actual Xen domid of the running domain then > I'm not sure what that would buy us. If I had a domain with a > filename of 010-ldap0.cfg it might get strted first and have domid > 1, but then I reboot it and it has domid 99, I wouldn't want it > saved as /var/lib/xen/save/99-ladp0, I'd still want it saved as > /var/lib/xen/save/010-ladp0,Minimal meaning very simple to implement, but very limited. The idea is domains which start later get higher domain Ids. As long as crucial domains rarely get restarted, they will tend to keep low domain Ids. This fails when a crucial domain gets restarted late due to some reason, but this might capture enough low-hanging fruit to be worthwhile.> > A better approach would be to have a LSB style header specifying > > dependencies to flag VMs which should be saved or shutdown late, > > and VMs which should be saved or shutdown early. > > > > A ridiculous overkill solution might be to turn the /etc/xen/*.cfg > > files into full init scripts. > > I don't think that we should be proposing to change the config > language of upstream Xen or diverge from how domains are usually > configured with upstream Xen. I think that we can get a lot of > improvement without modifying the format of the config files and > only by changing how the start and shutdown scripts work. > > At the moment domain start and shutdown is serial in nature and can > take a long time. I don't know if there is any scope for improving > that in scripts, or whether it's an upstream conversation, either > way not for this bug. But because of the lengthy process I do have > an interest in starting my important domains first and shutting them > down last.I'm pretty sure #452721 is tagged "upstream" since the `xendomains` originates from the Xen project. If a solution is likely to be pushed back to the Xen project, then nearly anything is on the table. Just an issue of how much time is needed. What I was suggesting was NOT to modify the configuration format. The idea was a program could treat the domain configuration as if it was a script (get it from argv[0]), then simply implement start/stop (roughly system(`xl create $0`)). Ultimately I suspect the domain configuration files need to add an "init_handler" setting for specifying a program to be used for start/stop. Then "init_config" setting for configuring that program. If this is saved in the runtime configuration (`xl list -l`), then unhandled domains are readily identified by the lack of this configuration.> While being able to control ordering of shutdown would be NICE, it > seems like this would be catering to the administrator of a single > dom0 that can't make services redundant. This raises the question > of what are such administrators doing about the risk of their one > dom0 host becoming unavailable and all its domains with it?I suspect this crowd are the ones Debian should be catering most to. Large enough to have some fairly complicated needs, but small enough not to have a full IT department. There are also a very large number of people in this category.> I also feel that trying to add dependency logic into the > configuration is stepping into territory best left to actual cluster > management software, that says what order things should start/stop > in, how many copies of them need to run, where they can be allowed > to run for redundancy purposes, etc.True, though a little bit would help many people. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg at m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Diederik de Haas
2021-Sep-28 09:45 UTC
[Pkg-xen-devel] Bug#452721: Bug#452721: "xendomains" does not restore domains in same order as it would start them
On Monday, 27 September 2021 19:13:04 CEST Andy Smith wrote:> I think the "auto" directory is a pretty good and simple interface, > so how about using it for save/shutdown as well? So, instead of just > enumerating all running domids, enumerate all files in > /etc/xen/auto/ in REVERSE order, parsing the name of the domain out > of each one and doing the action on that name. When all files have > been exhausted, THEN do the action on any remaining running domains.I'm not familiar with the "auto" directory('s functionality), but I _assume_ that it's a directory which contains Xen domain config files which are automatically started up at boot time (in alphabetical sequence). The user can choose to start other VMs if (s)he so chooses. If that's correct, then I find it more logical to do *everything* in reverse. The VM that was started first, should be saved/shutdown last and IIUC your proposal would not do that. What makes the most sense to me is that the last started VM should be saved/ shutdown first, which would be one of the "remaining running domains". Once all the "remaining" ones have been saved/shutdown, THEN do the auto ones in reverse order. Could the domain ID be used for that? I haven't studied it 'in detail', but they seem sequential. But my main point is that I think the proposed sequence should be adjusted. Cheers, Diederik -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: This is a digitally signed message part. URL: <http://alioth-lists.debian.net/pipermail/pkg-xen-devel/attachments/20210928/475d61e8/attachment.sig>
Andy Smith
2021-Sep-28 11:41 UTC
[Pkg-xen-devel] Bug#452721: Bug#452721: "xendomains" does not restore domains in same order as it would start them
Hi Diederik, On Tue, Sep 28, 2021 at 11:45:08AM +0200, Diederik de Haas wrote:> On Monday, 27 September 2021 19:13:04 CEST Andy Smith wrote: > > I think the "auto" directory is a pretty good and simple interface, > > so how about using it for save/shutdown as well? So, instead of just > > enumerating all running domids, enumerate all files in > > /etc/xen/auto/ in REVERSE order, parsing the name of the domain out > > of each one and doing the action on that name. When all files have > > been exhausted, THEN do the action on any remaining running domains. > > I'm not familiar with the "auto" directory('s functionality), but I _assume_ > that it's a directory which contains Xen domain config files which are > automatically started up at boot time (in alphabetical sequence). > The user can choose to start other VMs if (s)he so chooses.Yes; you typically symlink for example /etc/xen/foo.cfg to /etc/xen/auto/100-foo.cfg so as to enforce some order of automatic startup. Currently shutdown just goes in order of running domain id though.> If that's correct, then I find it more logical to do *everything* in reverse. > The VM that was started first, should be saved/shutdown last and IIUC your > proposal would not do that.No, that's what I am suggesting too: again walk the "auto" directory but in reverse order.> Once all the "remaining" ones have been saved/shutdown, THEN do > the auto ones in reverse order.A problem here would be excluding the domains that have a specified order from the initial round of shutdowns, which is why I suggested doing it in reverse order by the "auto" directory and THEN shutting down anything that's left as normal, since that way you don't need to check anything. As you've pointed out, this does mean that if you had linked say /etc/xen/auto/010-important.cfg with the intention that it be started first and shut down last, you would have to also link in every other domain in its correct order otherwise the not-mentioned ones would be shut down after 010-important. However, I feel like people who use the /etc/xen/auto directory do already link all or the majority of their domains in there - I certainly do. I don't find it onerous to say that if you want to specify shutdown order then you must link all of the domains in /etc/xen/auto not just some of them. Otherwise, if you wanted to say that all non-mentioned domains must be shutdown first then I guess you'd have to parse the list of domain names from the "suto" directory first, then get the list of running domains from /usr/lib/xen-common/bin/xen-init-list and exclude one from the other for the initial round of shutdowns.> Could the domain ID be used for that?I don't like it because it only says how recent a domain was started relative to others, not any intention about start/stop order. Shut one down manually (or crash) and start it again and it gets a new domid higher than all existing.> But my main point is that I think the proposed sequence should be adjusted.We agree about reverse order, I think we only disagree about when to shut down domains that don't have a preference set. I am all for keeping it simple by saying ordering must be set for all domains otherwise ordering for remaining ones is not defined. Cheers, Andy
Diederik de Haas
2021-Sep-28 21:39 UTC
[Pkg-xen-devel] Bug#452721: Bug#452721: "xendomains" does not restore domains in same order as it would start them
Hi Andy, On Tuesday, 28 September 2021 13:41:57 CEST Andy Smith wrote:> > If that's correct, then I find it more logical to do *everything* in > > reverse. The VM that was started first, should be saved/shutdown last and > > IIUC your proposal would not do that. > > No, that's what I am suggesting too: again walk the "auto" directory > but in reverse order.I understood that. My point was about the sequence of auto vs non-auto> > Once all the "remaining" ones have been saved/shutdown, THEN do > > the auto ones in reverse order. > > A problem here would be excluding the domains that have a specified > order from the initial round of shutdownsThat may involve some scripting/programming, but *I* don't consider that a valid argument to not do it...> which is why I suggested doing it in reverse order by the "auto" directory > and THEN shutting down anything that's left as normal, since that way you > don't need to check anything.... and I think that's wrong. The use case I'm imagining is some domains are important or even essential for the working of other domains and that's why you want/need to start them as soon as possible with (potentially) a dependence between them, therefor you specify the correct order in the auto directory. Let's say you have a special storage domain which provides storage for all the domU domains. Without that domain running you CANNOT start any other domain, so you start that domain first in the auto directory. If you start the shutdown/save-all-domains procedure the storage domain MUST be the last one to be shutdown, because otherwise you'd pulling the storage under live domains, which likely will make them crash. In any case, they will not be able to shutdown cleanly or save their current state to disk ... because the disks are gone.> As you've pointed out, this does mean that if you had linked say > /etc/xen/auto/010-important.cfg with the intention that it be > started first and shut down last, you would have to also link in > every other domain in its correct order otherwise the not-mentioned > ones would be shut down after 010-important.Indeed. I hope that I explained sufficiently why that is wrong or can even be catastrophic AFAICT.> However, I feel like people who use the /etc/xen/auto directory do > already link all or the majority of their domains in thereI think that that is a (too) dangerous assumption. You could use (say) 3 domains which provide essential services to all other domains, but after that every user is free to do whatever (s)he wants.> - I certainly do. I don't find it onerous to say that if you want to > specify shutdown order then you must link all of the domains in > /etc/xen/auto not just some of them.That would make the scenario I described above unworkable or needlessly complex, so I don't think that's a good/valid solution.> > Could the domain ID be used for that? > > I don't like it because it only says how recent a domain was > started relative to others, not any intention about start/stop > order. Shut one down manually (or crash) and start it again and it > gets a new domid higher than all existing.It is a (really) simple heuristic and likely too simple. But at first glance it seemed (to me) to actually do the right thing.> We agree about reverse order, I think we only disagree about when to > shut down domains that don't have a preference set.Indeed.> I am all for keeping it simple by saying ordering must be set for all > domains otherwise ordering for remaining ones is not defined.I like simple too, but I think this actually makes it complex. I really agree with the 'upstream' tag as not only should it be fixed/adjusted there, but it also engages a (much) larger audience who think of scenarios we likely didn't think about. And they're certainly much more knowledgeable then I am. Cheers, Diederik -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: This is a digitally signed message part. URL: <http://alioth-lists.debian.net/pipermail/pkg-xen-devel/attachments/20210928/c8c56d34/attachment.sig>
Andy Smith
2021-Sep-28 22:02 UTC
[Pkg-xen-devel] Bug#452721: Bug#452721: "xendomains" does not restore domains in same order as it would start them
Hi Diederik, On Tue, Sep 28, 2021 at 11:39:49PM +0200, Diederik de Haas wrote:> On Tuesday, 28 September 2021 13:41:57 CEST Andy Smith wrote: > > We agree about reverse order, I think we only disagree about when to > > shut down domains that don't have a preference set. > > Indeed.Okay, well I am satisfied by the lesser idea of having to specify order of all domains but if the implementer of the solution isn't and decides to implement it so not-specified domains shut down first then that works for me too so have no objection to it. The idea of the domid controlling/influencing order of shutdown would not work for me as to me that is not much different to how we have things now - domains shut down in increasing order of domid. I can't control it so I just would continue using my own shutdown script.> I really agree with the 'upstream' tag as not only should it be fixed/adjusted > there, but it also engages a (much) larger audience who think of scenarios we > likely didn't think about.Should we move discussion to xen-users at lists.xen.org then? Cheers, Andy
Diederik de Haas
2021-Sep-28 23:24 UTC
[Pkg-xen-devel] Bug#452721: Bug#452721: "xendomains" does not restore domains in same order as it would start them
Hi Andy, On Wednesday, 29 September 2021 00:02:46 CEST Andy Smith wrote:> On Tue, Sep 28, 2021 at 11:39:49PM +0200, Diederik de Haas wrote: > The idea of the domid controlling/influencing order of shutdownIt was just an idea that popped in my head. All in all I've likely spend less then a minute thinking about the domid idea. Don't spend more on it then you already have ;)> > I really agree with the 'upstream' tag as not only should it be > > fixed/adjusted there, but it also engages a (much) larger audience who > > think of scenarios we likely didn't think about. > > Should we move discussion to xen-users at lists.xen.org then?I can make a case for both xen-users and xen-devel. xen-users: It could be that a solution already exists. I know that in Qubes (which uses Xen) has some dependency mechanism in that if you start vmA which depends on vmB, then it first starts vmB and then vmA. I don't know if that is a Qubes 'extension' or that they simply use available functionality of Xen. xen-devel: If needed functionality doesn't yet exist and needs to be built anew, then xen-devel is the right place to discuss that. It could be that the best place to start is xen-users which then may/could 'transition' to xen-devel. Let's hear others first what they think is the best approach. Cheers, Diederik -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 228 bytes Desc: This is a digitally signed message part. URL: <http://alioth-lists.debian.net/pipermail/pkg-xen-devel/attachments/20210929/48b06255/attachment.sig>
Elliott Mitchell
2021-Sep-29 02:23 UTC
[Pkg-xen-devel] Bug#452721: Bug#452721: "xendomains" does not restore domains in same order as it would start them
On Tue, Sep 28, 2021 at 11:39:49PM +0200, Diederik de Haas wrote:> On Tuesday, 28 September 2021 13:41:57 CEST Andy Smith wrote: > > > > > Could the domain ID be used for that? > > > > I don't like it because it only says how recent a domain was > > started relative to others, not any intention about start/stop > > order. Shut one down manually (or crash) and start it again and it > > gets a new domid higher than all existing. > > It is a (really) simple heuristic and likely too simple. > But at first glance it seemed (to me) to actually do the right thing.It is *definitely* too simple to do a good job; however, this has the advantages of being a significant improvement and simple enough to be in service quickly. On Wed, Sep 29, 2021 at 01:24:58AM +0200, Diederik de Haas wrote:> On Wednesday, 29 September 2021 00:02:46 CEST Andy Smith wrote: > > On Tue, Sep 28, 2021 at 11:39:49PM +0200, Diederik de Haas wrote: > > The idea of the domid controlling/influencing order of shutdown > > It was just an idea that popped in my head. All in all I've likely spend less > then a minute thinking about the domid idea. > Don't spend more on it then you already have ;)The record shows I suggested it first: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=452721#35 This isn't an adaquate solution, but is a distinct improvement.> > > I really agree with the 'upstream' tag as not only should it be > > > fixed/adjusted there, but it also engages a (much) larger audience who > > > think of scenarios we likely didn't think about. > > > > Should we move discussion to xen-users at lists.xen.org then? > > I can make a case for both xen-users and xen-devel. > xen-users: > It could be that a solution already exists. I know that in Qubes (which uses > Xen) has some dependency mechanism in that if you start vmA which depends on > vmB, then it first starts vmB and then vmA. I don't know if that is a Qubes > 'extension' or that they simply use available functionality of Xen.Could be interesting to learn of what solutions are already out there and what features are must have. Most existing solutions likely have problems. Some may be GPL-incompatible. Most are likely very limited.> xen-devel: > If needed functionality doesn't yet exist and needs to be built anew, then > xen-devel is the right place to discuss that. > > It could be that the best place to start is xen-users which then may/could > 'transition' to xen-devel. > > Let's hear others first what they think is the best approach.Perhaps. Question is how much person-time is available for this? If a great deal of xen-devel person-time can be devoted to this a very ambitious solution might be viable. If only a little bit of xen-devel person-time is available, the approach would need to be very limited. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg at m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Elliott Mitchell
2023-Jul-31 01:39 UTC
[Pkg-xen-devel] Bug#452721: irt: Bug#452721 notes from explorations
Even though there hasn't been any discussion recently, bug #452721 is very much still of major concern to me. First issue is how to parse domain configuration files. Reason being a foo.cfg file might have the configuration 'name = "bar"'. This would also let the script retrieve the UUID if that has been set. Turns out while Python in domain configuration files isn't supportted, the syntax is still a proper subset of the Python language. This makes Python the ideal programming language for a replacement script. Only weakness is being able to have full Python syntax in configuration files might make the task simpler. Presently I hope to convince the Xen core to allow full Python in domain configuration files, but no news on that front so far. This would mean /etc/default/xendomains would need to change to match Python syntax. My thinking for adding to domain configuration files would be something along these lines: init = { 'tool': 'xendomains-ng', 'version': 0, 'order': 9, 'startwait': 60, 'stopaction': 'save', } Mainly a Python dictionary holding key values. Thought being the 'tool' and 'version' values, is to hope for some form of compatibility if such scripts were to become common. My thinking is 'order' would indicate sequence. Domains with higher order get started first (same order would nominally allow parallel start). If a domain.cfg file didn't define order then its order is 0. 'startwait' would tell the script to wait that long before starting subsquent domains. 'stopaction' would allow different actions if the machine was to stop. The 3 options which come to mind are 'stop' (shutdown), 'save' (save to specified storage location), and 'migrate'. If full Python doesn't become available, this might take the format: init = 'tool=xendomains-ng,version=0,order=9,startwait=60,stopaction=save' Not needing to parse the string though does make one's life simpler. Other concerns include: Sometimes you may want to take a distinct action during stop. Ie if you're doing restarts for kernel updates, you'll want to override and have domains reboot. It may be handier to have distinct options for 'restart'. Full restarts can follow proper order, or could simply involve bouncing domains based on order. Notably with HVM domains and Qemu updates, you could do: order 0 down, order 1 down, order 9 down, order 9 up, order 2 up, order 0 up Or you could do: order 9 down, order 9 up, order 1 down, order 1 up, order 0 down, order 0 up I'm basically certain writing a new xendomains script in Python is the way to go. Now to get an answer as to whether full Python in domain configuration files could be reenabled. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg at m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
zithro
2023-Jul-31 17:10 UTC
[Pkg-xen-devel] Bug#452721: Bug#452721: irt: Bug#452721 notes from explorations
On 31 Jul 2023 03:39, Elliott Mitchell wrote:> Presently I hope to convince the Xen core to allow full Python in domain > configuration files, but no news on that front so far. This would mean > /etc/default/xendomains would need to change to match Python syntax.There was an answer today on xen-devel: the ability to use scripts in domU cfg files has been explicitely removed for various reasons. This does not prevent you from "source"-ing teh cfg files in your script(s) if they are proper Python syntax. Or you could simply parse/regex the values you want. And as Marek suggested in his answer, you can also put any arbitrary settings in the comments. Although ...> My thinking for adding to domain configuration files would be something > along these lines: > > init = { > 'tool': 'xendomains-ng', > 'version': 0, > 'order': 9, > 'startwait': 60, > 'stopaction': 'save', > }The problem with adding this to a domU config file is that it could cause problems for (live) migrations. The start/stop order is "per dom0", and may be different on another one. Imagine two dom0s, one storing the domain files "locally", while the other uses NFS. Only in the second case the domU should wait for the NFS server/domain to be available. To me, the start/stop logic should be in a dom0 config file.> 'startwait' would tell the script to wait that long before starting > subsquent domains.A time-based wait may be useful for when everything goes well, but what about when there are problems ? If you want to be sure a domain is up (ie. ready to serve), you would need to peek at the related "service". For example, to be sure a DNS domU is up, you would have to try a DNS request, as a ping or "xl list" would not be enough. Also, domains in xen/auto are started with a mix of serialization AND parallelization, as "xl create" returns once the domain has started (ie. in the Xen point of view, not the user's).> 'stopaction' would allow different actions if the machine was to stop. > The 3 options which come to mind are 'stop' (shutdown), 'save' (save to > specified storage location), and 'migrate'.Then, each time you do NOT want to follow the usual action, you'd have to edit -each- domU cfg file ?> If full Python doesn't become available, this might take the format: > init = 'tool=xendomains-ng,version=0,order=9,startwait=60,stopaction=save' > Not needing to parse the string though does make one's life simpler.Well, it makes -your- life easier, not the maintainers' one ;)> I'm basically certain writing a new xendomains script in Python is the > way to go. Now to get an answer as to whether full Python in domain > configuration files could be reenabled.I'm not sure a Python script would solve anything, as (ba)sh variables are imported from other files. (see for example https://salsa.debian.org/xen-team/debian-xen/-/blob/master/tools/hotplug/Linux/xendomains.in) Everything considered, I'm not sure why Xen should provide such functionnality. I think custom scripts can handle all the various use cases, don't you think ? PS: as mentionned by diederik, the "dependency" logic is already handled by Qubes since years, and it never made it to Xen (I don't know the reasons though). But I agree the shutdown sequence could be adapted to : 1. first shutdown the domains NOT in xen/auto 2. then shutdown the domains in xen/auto, in reverse order For fine grained start/stop order, maybe having a dom0 config file handling this could be added, like: # START/STOP ORDER # domains not in these lists will be started after and stopped # before the ones here start-order=(list of domU names) stop-order=(list of domU names) But then again, this only ensures "domains" start order, not "services availability" in said domains. -- zithro / Cyril
Elliott Mitchell
2023-Aug-17 21:21 UTC
[Pkg-xen-devel] Bug#452721: irt: irt: Bug#452721 notes from explorations
Synthesizing things since I hadn't been copied on previous message... On Mon Jul 31 18:10:34 BST 2023, zithro wrote:> > On 31 Jul 2023 03:39, Elliott Mitchell wrote: > > > Presently I hope to convince the Xen core to allow full Python in domain > > configuration files, but no news on that front so far. This would mean > > /etc/default/xendomains would need to change to match Python syntax. > > There was an answer today on xen-devel: the ability to use scripts in > domU cfg files has been explicitely removed for various reasons. > This does not prevent you from "source"-ing teh cfg files in your > script(s) if they are proper Python syntax. Or you could simply > parse/regex the values you want.Though the reasons given seem orthogonal to my thinking. I'm thinking use libpython as the parser since that allows dictionaries and guarantees the syntax remains a subset of Python. Whereas the responses read like they think I'm asking for full Python scripts as domain configurations (which is a very large superset of what I'm proposing).> And as Marek suggested in his answer, you can also put any arbitrary > settings in the comments.I had already thought of that as it is a common strategy for such things. This though has substantial limitations and since Python has all the capabilities needed, strategies based on Python seem very attractive. I was thinking Perl for a bit, but Python provides a simple strategy for extracting required information out of configurations. Crucially the UUID which lets you match running domains to their configuration.> Although ... > > > My thinking for adding to domain configuration files would be something > > along these lines: > > > > init = { > > 'tool': 'xendomains-ng', > > 'version': 0, > > 'order': 9, > > 'startwait': 60, > > 'stopaction': 'save', > > } > > The problem with adding this to a domU config file is that it could > cause problems for (live) migrations. The start/stop order is "per > dom0", and may be different on another one. > Imagine two dom0s, one storing the domain files "locally", while the > other uses NFS. Only in the second case the domU should wait for the NFS > server/domain to be available. > > To me, the start/stop logic should be in a dom0 config file.I'm not understanding the situation you're thinking of. The closest I can come is you're thinking of a situation which would be handled by having host defaults, but also overrides in domain.cfg files. Generic VMs would act according to the host settings, only domains which had overridden values would act differently. You could have a network of VM hosts where normal hosts specify 'migrate' in /etc/default/xendomains. Then you have the magic host which specifies 'save' or 'shutdown'. You would also specify something other than 'migrate' for domains handling services local to a particular host.> > 'startwait' would tell the script to wait that long before starting > > subsquent domains. > > A time-based wait may be useful for when everything goes well, but what > about when there are problems ? > If you want to be sure a domain is up (ie. ready to serve), you would > need to peek at the related "service". > For example, to be sure a DNS domU is up, you would have to try a DNS > request, as a ping or "xl list" would not be enough. > Also, domains in xen/auto are started with a mix of serialization AND > parallelization, as "xl create" returns once the domain has started (ie. > in the Xen point of view, not the user's).Indeed. I'm well aware what I'm suggesting has major limitations. I'm proposing what I consider feasible given available time. What you're suggesting could be a feature for v2, which might be written based on what I manage.> > 'stopaction' would allow different actions if the machine was to stop. > > The 3 options which come to mind are 'stop' (shutdown), 'save' (save to > > specified storage location), and 'migrate'. > > Then, each time you do NOT want to follow the usual action, you'd have > to edit -each- domU cfg file ?Usually if you didn't want to follow the usual action, you would invoke `xl` manually. What has come to mind though is perhaps the action should be uploaded to the xenstore. Then when an unusual action was desired, the xenstore information could be changed and the action would follow the domain. This though seems a feature for a future version.> > If full Python doesn't become available, this might take the format: > > init = 'tool=xendomains-ng,version=0,order=9,startwait=60,stopaction=save' > > Not needing to parse the string though does make one's life simpler. > > Well, it makes -your- life easier, not the maintainers' one ;)Given what the parser looks like, it shouldn't take much to make things easier. I suspect getting rid of the Bison/Flex parser and using libpython will be easier. Though the maintainers may disagree. Dunno until things reach full implementation. What I've sent so far is merely to identify the border between the lower-level parser and upper-level stuff. I suspect for a while it might be possible to switch between the two. Until it turned out one or the other had very little uptake.> > I'm basically certain writing a new xendomains script in Python is the > > way to go. Now to get an answer as to whether full Python in domain > > configuration files could be reenabled. > > I'm not sure a Python script would solve anything, as (ba)sh variables > are imported from other files. > (see for example > https://salsa.debian.org/xen-team/debian-xen/-/blob/master/tools/hotplug/Linux/xendomains.in)I'm well aware of this. I would turn those into Python-fragments (sort of how domain.cfg files conform to Python-syntax). This would need some shell scripting during installation to upgrade old files, but this seems feasible. Roughly: xen-utils-common.postinst: if grep -q -eXENDOMAINS /etc/default/xendomains; then . /etc/default/xendomains export XENDOMAINS_AUTO XENDOMAINS_MIGRATE XENDOMAINS_RESTORE \ XENDOMAINS_SAVE XENDOMAINS_STOP_MAXWAIT /var/lib/dpkg/info/xen-utils-common.newdomainswriter.py fi Where `xen-utils-common.newdomainswriter.py` is used to ensure quoting matches Python's requirements. There would also be a need to ask permission to upgrade the configuration file to something which older tools won't accept. Worst case a replacement filename could be used.> Everything considered, I'm not sure why Xen should provide such > functionnality. > I think custom scripts can handle all the various use cases, don't you > think ? > PS: as mentionned by diederik, the "dependency" logic is already handled > by Qubes since years, and it never made it to Xen (I don't know the > reasons though).Indeed. This is why my only desire for the Xen project is to allow full Python syntax, by using libpython instead of the Bison/Flex parser. Having generic scripts in the Xen Project might be nice though...> But I agree the shutdown sequence could be adapted to : > 1. first shutdown the domains NOT in xen/auto > 2. then shutdown the domains in xen/auto, in reverse orderI was thinking of slightly different logic. Domains without a specified order are order 0. Domains with higher order start first and are shutdown last. This has two implications: If a domain doesn't have a configuration file in /etc/xen/auto then it has order 0 and will be stopped at the same time as other order 0 domains. There is no requirement for the order to be positive. I'm not sure what sorts of things would start after defaults, but I'm sure someone will come up with a use case. I wouldn't bother explicitly making unlisted domains stop earlier. Simply stop them with the rest of the commoners. Anything which needs to stop after unlisted domains should have a positive order.> For fine grained start/stop order, maybe having a dom0 config file > handling this could be added, like: > > # START/STOP ORDER > # domains not in these lists will be started after and stopped > # before the ones here > start-order=(list of domU names) > stop-order=(list of domU names)That should be feasible. Are you volunteering to write such a thing? Several issues though: Can this express domains which could nominally be started in parallel? (domains in my proposal can have the same order) How would this express domains which should have different shutdown actions? Then of course your last question equally applies to your proposal:> But then again, this only ensures "domains" start order, not "services > availability" in said domains.Indeed. I'm suggesting something which seems feasible, not something which is ideal. My hope is having "startwait" to specify a delay would be Good Enough(tm) for a first version. How would your proposal handle this? -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg at m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445
Seemingly Similar Threads
- [PATCH 00/12] Bunch of patches for cross-compilatio + RP4
- Bug#810964: only partial EDAC information with Xen
- Bug#988477: xen-hypervisor-4.14-amd64: xen dmesg shows (XEN) AMD-Vi: IO_PAGE_FAULT on sata pci device
- [PATCH 12/12] Partially revert "Cross-compilation fixes."
- [PATCH] debian/scripts: Optimize scripts