Lentes, Bernd
2020-Oct-07 17:12 UTC
Is it possible that "virsh destroy" does not stop a domain ?
Hi, Is it possible that "virsh destroy" does not stop a domain ? I'm asking because i have some domains running in a two-node HA-Cluster (pacemaker). And sometimes one node get fenced (killed) because it couldn't stop a domain. That's very ugly. This is also the reason why i asked before what "virsh destroy" really does ? IIRC a kill -9 can't terminate a process which is in "D" state (uninterruptible sleep). So if the process of the domain is in "D" state, it can't be finished. Right ? Pacemaker tries to shutdown or destroy a domain with a resource agent, which is a shell script, similar to an init script. Here is an excerp from the resource agent for virtual domains: force_stop() { local out ex translate local status=0 ocf_log info "Issuing forced shutdown (destroy) request for domain ${DOMAIN_NAME}." out=$(LANG=C virsh $VIRSH_OPTIONS destroy ${DOMAIN_NAME} 2>&1) # hier wird die domain destroyed ex=$? translate=$(echo $out|tr 'A-Z' 'a-z') echo >&2 "$translate" case $ex$translate in *"error:"*"domain is not running"*|*"error:"*"domain not found"*|\ *"error:"*"failed to get domain"*) : ;; # unexpected path to the intended outcome, all is well sucess [!0]*) ocf_exit_reason "forced stop failed" # <============ fail of destroy seems to be possible return $OCF_ERR_GENERIC ;; 0*) while [ $status != $OCF_NOT_RUNNING ]; do VirtualDomain_status status=$? done ;; esac return $OCF_SUCCESS } The function force_stop is responsible for stop/destroy the domain. And it cares about a non-working "virsh destroy". Is there a developer who can explain what "virsh destroy" really does ? Or is there another ML for the developers ? Bernd -- Bernd Lentes Systemadministration Institute for Metabolism and Cell Death (MCD) Building 25 - office 122 HelmholtzZentrum München bernd.lentes@helmholtz-muenchen.de phone: +49 89 3187 1241 phone: +49 89 3187 3827 fax: +49 89 3187 2294 http://www.helmholtz-muenchen.de/mcd stay healthy Helmholtz Zentrum München Helmholtz Zentrum München
Peter Crowther
2020-Oct-07 17:26 UTC
Re: Is it possible that "virsh destroy" does not stop a domain ?
Bernd, another option would be a mismatch between the message that "virsh destroy" issues and the message that force_stop() in the pacemaker agent expects to receive. Pacemaker is trying to determine the success or failure of the destroy based on the concatenation of the text of the exit code and the text output by virsh; if either of those have changed between virsh versions, and especially if virsh destroy ever exits with a status other than zero, then you'll get that OCF error. Do you know what $VIRSH_OPTIONS ends up as in your Pacemaker config, particularly whether --graceful is specified? Cheers, - Peter On Wed, 7 Oct 2020 at 18:13, Lentes, Bernd < bernd.lentes@helmholtz-muenchen.de> wrote:> Hi, > > Is it possible that "virsh destroy" does not stop a domain ? > I'm asking because i have some domains running in a two-node HA-Cluster > (pacemaker). > And sometimes one node get fenced (killed) because it couldn't stop a > domain. > That's very ugly. > > This is also the reason why i asked before what "virsh destroy" really > does ? > IIRC a kill -9 can't terminate a process which is in "D" state > (uninterruptible sleep). > So if the process of the domain is in "D" state, it can't be finished. > Right ? > > Pacemaker tries to shutdown or destroy a domain with a resource agent, > which is a shell script, similar > to an init script. > > Here is an excerp from the resource agent for virtual domains: > > force_stop() > { > local out ex translate > local status=0 > > ocf_log info "Issuing forced shutdown (destroy) request for domain > ${DOMAIN_NAME}." > out=$(LANG=C virsh $VIRSH_OPTIONS destroy ${DOMAIN_NAME} 2>&1) > # hier wird die domain destroyed > ex=$? > translate=$(echo $out|tr 'A-Z' 'a-z') > echo >&2 "$translate" > case $ex$translate in > *"error:"*"domain is not running"*|*"error:"*"domain not > found"*|\ > *"error:"*"failed to get domain"*) > : ;; # unexpected path to the intended outcome, > all is well sucess > [!0]*) > ocf_exit_reason "forced stop failed" # > <============ fail of destroy seems to be possible > return $OCF_ERR_GENERIC ;; > 0*) > while [ $status != $OCF_NOT_RUNNING ]; do > VirtualDomain_status > status=$? > done ;; > esac > return $OCF_SUCCESS > } > > The function force_stop is responsible for stop/destroy the domain. > And it cares about a non-working "virsh destroy". > Is there a developer who can explain what "virsh destroy" really does ? > Or is there another ML for the developers ? > > Bernd > > -- > > Bernd Lentes > Systemadministration > Institute for Metabolism and Cell Death (MCD) > Building 25 - office 122 > HelmholtzZentrum München > bernd.lentes@helmholtz-muenchen.de > phone: +49 89 3187 1241 > phone: +49 89 3187 3827 > fax: +49 89 3187 2294 > http://www.helmholtz-muenchen.de/mcd > > stay healthy > Helmholtz Zentrum München > > Helmholtz Zentrum München > > >
Lentes, Bernd
2020-Oct-08 16:25 UTC
Re: Is it possible that "virsh destroy" does not stop a domain ?
----- On Oct 7, 2020, at 7:26 PM, Peter Crowther peter.crowther@melandra.com wrote:> Bernd, another option would be a mismatch between the message that "virsh > destroy" issues and the message that force_stop() in the pacemaker agent > expects to receive. Pacemaker is trying to determine the success or failure of > the destroy based on the concatenation of the text of the exit code and the > text output by virsh; if either of those have changed between virsh versions, > and especially if virsh destroy ever exits with a status other than zero, then > you'll get that OCF error.> Do you know what $VIRSH_OPTIONS ends up as in your Pacemaker config, > particularly whether --graceful is specified?> Cheers,> - PeterHi Peter, that means in the end that with "virsh destroy" i can't be 100% sure that a domain is stopped. Is there another way ? Bernd Helmholtz Zentrum München Helmholtz Zentrum München
Apparently Analagous Threads
- Re: Is it possible that "virsh destroy" does not stop a domain ?
- Re: Is it possible that "virsh destroy" does not stop a domain ?
- can't define domain - error: cannot open /dev/null: Operation not permitted
- can hotplug vcpus to running Windows 10 guest, but not unplug
- problems with understanding of the memory parameters in the xml file