thr3ads.net - Xen users - Dom0 crashed when rebooting whilst DomU are running [Sep 2012]

If this information is useful, please help other people find it:
Share via:

Maik Brauer

2012-Sep-03 11:49 UTC

Dom0 crashed when rebooting whilst DomU are running

Hello,

when for example the Dom0 needs to be rebooted manually and several
DomU''s are running,
the Dom0 will crash during shutdown, that just a Manual Hardware Reset will help
to reboot.

Current HVM is: Xen 4.1.3
Current Kernel: 2.6.32-5-xen-amd64

Is there a know issue, or what might could case this issue.
When rebooting the Dom0 without having at least one DomU running, then the
reboot is successful.
Can someone help me?
Thanks.

Cheers,
Maik

Casey DeLorme

2012-Sep-03 16:27 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

This could have changed, but I am fairly certain rebooting Dom0 reboots
Xen, hence any running DomU''s will have to be shutdown.

A 2010 reference:
http://old-list-archives.xen.org/archives/html/xen-users/2010-04/msg00710.html

You could always try running aliasing the shutdown command to shutdown all
running domU''s by name, but otherwise forcing them to shutdown as Dom0
does
could damage your data.

On Mon, Sep 3, 2012 at 7:49 AM, Maik Brauer
<maik.brauer@mbs-systems.net>wrote:
> Hello,
>
> when for example the Dom0 needs to be rebooted manually and several
DomU''s
> are running,
> the Dom0 will crash during shutdown, that just a Manual Hardware Reset
> will help to reboot.
>
> Current HVM is: Xen 4.1.3
> Current Kernel: 2.6.32-5-xen-amd64
>
> Is there a know issue, or what might could case this issue.
> When rebooting the Dom0 without having at least one DomU running, then the
> reboot is successful.
> Can someone help me?
> Thanks.
>
> Cheers,
> Maik
>
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xen.org
> http://lists.xen.org/xen-users
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xen.org
http://lists.xen.org/xen-users

Maik Brauer

2012-Sep-03 16:41 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

Hi,

that is true, I understand that, but at least if the Dom0 is rebooting and will
destroy the DomU''s, the
Dom0 itself should not crash and hang that a Hard-reset is needed to reboot.

Is there not any shutdown control, which can shutdown the DomU''s from
the Dom0 automatically in case of reboot or shutdown?
When I manually shutdown all DomU''s before rebooting Dom0, then the
Dom0 will reboot and come up again.


On Sep 3, 2012, at 6:27 PM, Casey DeLorme wrote:
> This could have changed, but I am fairly certain rebooting Dom0 reboots
Xen, hence any running DomU''s will have to be shutdown.
> 
> A 2010 reference:
>
http://old-list-archives.xen.org/archives/html/xen-users/2010-04/msg00710.html
> 
> You could always try running aliasing the shutdown command to shutdown all
running domU''s by name, but otherwise forcing them to shutdown as Dom0
does could damage your data.
> 
> On Mon, Sep 3, 2012 at 7:49 AM, Maik Brauer
<maik.brauer@mbs-systems.net> wrote:
> Hello,
> 
> when for example the Dom0 needs to be rebooted manually and several
DomU''s are running,
> the Dom0 will crash during shutdown, that just a Manual Hardware Reset will
help to reboot.
> 
> Current HVM is: Xen 4.1.3
> Current Kernel: 2.6.32-5-xen-amd64
> 
> Is there a know issue, or what might could case this issue.
> When rebooting the Dom0 without having at least one DomU running, then the
reboot is successful.
> Can someone help me?
> Thanks.
> 
> Cheers,
> Maik
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xen.org
> http://lists.xen.org/xen-users
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xen.org
> http://lists.xen.org/xen-users



_______________________________________________
Xen-users mailing list
Xen-users@lists.xen.org
http://lists.xen.org/xen-users

Casey DeLorme

2012-Sep-03 17:10 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

As stated, you can alias shutdown to do exactly what you need, it can be as
simple as a series of hard-coded operations to a complex custom shell
script that parses your domains and closes each with feedback.

I don''t know of a specific xl toolstack solution, but if you are using
the
xm toolstack you can try this:
http://www.novell.com/support/kb/doc.php?id=3029956

Keep in mind that without PV on HVM drivers or paravirtualized DomU''s
you''ll have to use the `destroy` command from Dom0, which is not a
graceful
shutdown.

On Mon, Sep 3, 2012 at 12:41 PM, Maik Brauer
<maik.brauer@mbs-systems.net>wrote:
> Hi,
>
> that is true, I understand that, but at least if the Dom0 is rebooting and
> will destroy the DomU''s, the
> Dom0 itself should not crash and hang that a Hard-reset is needed to
> reboot.
>
> Is there not any shutdown control, which can shutdown the DomU''s
from the
> Dom0 automatically in case of reboot or shutdown?
> When I manually shutdown all DomU''s before rebooting Dom0, then
the Dom0
> will reboot and come up again.
>
>
> On Sep 3, 2012, at 6:27 PM, Casey DeLorme wrote:
>
> This could have changed, but I am fairly certain rebooting Dom0 reboots
> Xen, hence any running DomU''s will have to be shutdown.
>
> A 2010 reference:
>
>
http://old-list-archives.xen.org/archives/html/xen-users/2010-04/msg00710.html
>
> You could always try running aliasing the shutdown command to shutdown all
> running domU''s by name, but otherwise forcing them to shutdown as
Dom0 does
> could damage your data.
>
> On Mon, Sep 3, 2012 at 7:49 AM, Maik Brauer
<maik.brauer@mbs-systems.net>wrote:
>
>> Hello,
>>
>> when for example the Dom0 needs to be rebooted manually and several
>> DomU''s are running,
>> the Dom0 will crash during shutdown, that just a Manual Hardware Reset
>> will help to reboot.
>>
>> Current HVM is: Xen 4.1.3
>> Current Kernel: 2.6.32-5-xen-amd64
>>
>> Is there a know issue, or what might could case this issue.
>> When rebooting the Dom0 without having at least one DomU running, then
>> the reboot is successful.
>> Can someone help me?
>> Thanks.
>>
>> Cheers,
>> Maik
>>
>>
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@lists.xen.org
>> http://lists.xen.org/xen-users
>>
>
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xen.org
> http://lists.xen.org/xen-users
>
>
>
>

_______________________________________________
Xen-users mailing list
Xen-users@lists.xen.org
http://lists.xen.org/xen-users

Ian Campbell

2012-Sep-04 08:11 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

Could you not top post please, it makes it rather hard to follow the
flow of the conversation.
On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote:> As stated, you can alias shutdown to do exactly what you need, it can
> be as simple as a series of hard-coded operations to a complex custom
> shell script that parses your domains and closes each with feedback.
Xen ships the "xendomains" initscript which can halt guest on shutdown
as well as automatically start specific guests on boot. It can also be
configured to suspend/resume them or (I think) migrate them away.

For diagnosing the crash itself more details will be required than were
provided in the original post. Please see
http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for some guidance.
At a minimum we would need a capture (serial console or photo) of the
crash backtrace.

Ian.

Maik Brauer

2012-Sep-08 14:50 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Sep 4, 2012, at 10:11 AM, Ian Campbell wrote:
> Could you not top post please, it makes it rather hard to follow the
> flow of the conversation.
> On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote:
>> As stated, you can alias shutdown to do exactly what you need, it can
>> be as simple as a series of hard-coded operations to a complex custom
>> shell script that parses your domains and closes each with feedback.
> 
> Xen ships the "xendomains" initscript which can halt guest on
shutdown
> as well as automatically start specific guests on boot. It can also be
> configured to suspend/resume them or (I think) migrate them away.
> 
> For diagnosing the crash itself more details will be required than were
> provided in the original post. Please see
> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for some guidance.
> At a minimum we would need a capture (serial console or photo) of the
> crash backtrace.
> 
> Ian.
> 
>   I found out that it hangs during re-boot of dom0 when having more Network
interfaces involved, like:
      vif = [ ''mac=06:46:AB:CC:11:01, ip=<myIPadress>'',
'''', '''', ''mac=06:04:AB:BB:11:03,
bridge=VLAN20, script=vif-bridge'', '''',
''mac=06:04:AB:BB:11:05, bridge=VLAN40, script=vif-bridge'' ]

  in case you use just one or having the basic line in place, it is working:  
      vif = [ '''' ]

  The system stops after initiating the reboot at the following line in the
console: System will restart...........

  In the Logfile of /var/log/message you can find this as the last line: 
        Sep  8 15:44:28 rootsrv01 shutdown[2445]: shutting down for system
reboot
	Sep  8 15:44:31 rootsrv01 kernel: [   73.716246] VLAN20: port 1(vif2.3)
entering forwarding state
	Sep  8 15:44:31 rootsrv01 kernel: [   74.500111] VLAN40: port 1(vif2.5)
entering forwarding state
	Sep  8 15:44:34 rootsrv01 kernel: [   77.317431] VLAN20: port 1(vif2.3)
entering disabled state
	Sep  8 15:44:34 rootsrv01 kernel: [   77.317490] VLAN20: port 1(vif2.3)
entering disabled state
	Sep  8 15:44:36 rootsrv01 kernel: [   79.368685] VLAN40: port 1(vif2.5)
entering disabled state
	Sep  8 15:44:36 rootsrv01 kernel: [   79.369156] VLAN40: port 1(vif2.5)
entering disabled state
	Sep  8 15:44:37 rootsrv01 kernel: Kernel logging (proc) stopped.
	Sep  8 15:44:37 rootsrv01 rsyslogd: [origin software="rsyslogd"
swVersion="4.6.4" x-pid="890"
x-info="http://www.rsyslog.com"] exiting on signal 15.
  
In the /var/log/daemong.log you can find this message:
         Sep  8 15:44:37 rootsrv01 acpid: exiting
         Sep  8 15:44:37 rootsrv01 rpc.statd[750]: Caught signal 15,
un-registering and exiting
         Sep  8 15:44:37 rootsrv01 udevd-work[2276]:
''/etc/xen/scripts/vif-setup offline type_if=vif'' unexpected
exit with status 0x000f
   
Cheers,
Maik
 > 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xen.org
> http://lists.xen.org/xen-users

Maik Brauer

2012-Sep-09 19:37 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Sep 8, 2012, at 4:50 PM, Maik Brauer wrote:
> 
> On Sep 4, 2012, at 10:11 AM, Ian Campbell wrote:
> 
>> Could you not top post please, it makes it rather hard to follow the
>> flow of the conversation.
>> On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote:
>>> As stated, you can alias shutdown to do exactly what you need, it
can
>>> be as simple as a series of hard-coded operations to a complex
custom
>>> shell script that parses your domains and closes each with
feedback.
>> 
>> Xen ships the "xendomains" initscript which can halt guest on
shutdown
>> as well as automatically start specific guests on boot. It can also be
>> configured to suspend/resume them or (I think) migrate them away.
>> 
>> For diagnosing the crash itself more details will be required than were
>> provided in the original post. Please see
>> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for some guidance.
>> At a minimum we would need a capture (serial console or photo) of the
>> crash backtrace.
>> 
>> Ian.
>> 
>> 
>  I found out that it hangs during re-boot of dom0 when having more Network
interfaces involved, like:
>      vif = [ ''mac=06:46:AB:CC:11:01,
ip=<myIPadress>'', '''', '''',
''mac=06:04:AB:BB:11:03, bridge=VLAN20, script=vif-bridge'',
'''', ''mac=06:04:AB:BB:11:05, bridge=VLAN40,
script=vif-bridge'' ]
> 
>  in case you use just one or having the basic line in place, it is working:
>      vif = [ '''' ]
> 
>  The system stops after initiating the reboot at the following line in the
console: System will restart...........
> 
>  In the Logfile of /var/log/message you can find this as the last line: 
>        Sep  8 15:44:28 rootsrv01 shutdown[2445]: shutting down for system
reboot
> 	Sep  8 15:44:31 rootsrv01 kernel: [   73.716246] VLAN20: port 1(vif2.3)
entering forwarding state
> 	Sep  8 15:44:31 rootsrv01 kernel: [   74.500111] VLAN40: port 1(vif2.5)
entering forwarding state
> 	Sep  8 15:44:34 rootsrv01 kernel: [   77.317431] VLAN20: port 1(vif2.3)
entering disabled state
> 	Sep  8 15:44:34 rootsrv01 kernel: [   77.317490] VLAN20: port 1(vif2.3)
entering disabled state
> 	Sep  8 15:44:36 rootsrv01 kernel: [   79.368685] VLAN40: port 1(vif2.5)
entering disabled state
> 	Sep  8 15:44:36 rootsrv01 kernel: [   79.369156] VLAN40: port 1(vif2.5)
entering disabled state
> 	Sep  8 15:44:37 rootsrv01 kernel: Kernel logging (proc) stopped.
> 	Sep  8 15:44:37 rootsrv01 rsyslogd: [origin software="rsyslogd"
swVersion="4.6.4" x-pid="890"
x-info="http://www.rsyslog.com"] exiting on signal 15.
> 
> In the /var/log/daemong.log you can find this message:
>         Sep  8 15:44:37 rootsrv01 acpid: exiting
>         Sep  8 15:44:37 rootsrv01 rpc.statd[750]: Caught signal 15,
un-registering and exiting
>         Sep  8 15:44:37 rootsrv01 udevd-work[2276]:
''/etc/xen/scripts/vif-setup offline type_if=vif'' unexpected
exit with status 0x000f
> 
> Cheers,
> Maik
> 
Hi Ian,

I found an interesting article in the internet, having exact the same issue:
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=630984

Can you confirm.

>> 
>> 
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@lists.xen.org
>> http://lists.xen.org/xen-users
> 
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xen.org
> http://lists.xen.org/xen-users

Ian Campbell

2012-Sep-10 08:39 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Sat, 2012-09-08 at 15:50 +0100, Maik Brauer wrote:> On Sep 4, 2012, at 10:11 AM, Ian Campbell wrote:
> 
> > Could you not top post please, it makes it rather hard to follow the
> > flow of the conversation.
> > On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote:
> >> As stated, you can alias shutdown to do exactly what you need, it
can
> >> be as simple as a series of hard-coded operations to a complex
custom
> >> shell script that parses your domains and closes each with
feedback.
> > 
> > Xen ships the "xendomains" initscript which can halt guest
on shutdown
> > as well as automatically start specific guests on boot. It can also be
> > configured to suspend/resume them or (I think) migrate them away.
> > 
> > For diagnosing the crash itself more details will be required than
were
> > provided in the original post. Please see
> > http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for some guidance.
> > At a minimum we would need a capture (serial console or photo) of the
> > crash backtrace.
> > 
> > Ian.
> > 
> > 
>   I found out that it hangs during re-boot of dom0 when having more
> Network interfaces involved, like:
>       vif = [ ''mac=06:46:AB:CC:11:01,
ip=<myIPadress>'', '''', '''',
> ''mac=06:04:AB:BB:11:03, bridge=VLAN20,
script=vif-bridge'', '''',
> ''mac=06:04:AB:BB:11:05, bridge=VLAN40, script=vif-bridge''
]
6 interfaces total, 3 of which have a random mac on each reboot and all
get put on the default bridge?

Is your default script vif-bridge or something else? Have you modified
any of these scripts?
>   in case you use just one or having the basic line in place, it is
> working:  
>       vif = [ '''' ]
> 
>   The system stops after initiating the reboot at the following line in the
console: System will restart...........
So this is a hang, not a crash as suggested originally?

If it is a hang then you might have some luck using hte magic sysrq keys
to print lists of blocked tasks. I''m not sure in Squeeze but you might
need to enable this as described in Documentation/sysrq.txt in the Linux
source.

Blocked tasks are listed with SysRQ-''w''. If you have serial
console then
''t'' will list all task, but that list can be quite long so it
is useless
without a serial console.
>   In the Logfile of /var/log/message you can find this as the last line: 
>         Sep  8 15:44:28 rootsrv01 shutdown[2445]: shutting down for system
reboot
> 	Sep  8 15:44:31 rootsrv01 kernel: [   73.716246] VLAN20: port 1(vif2.3)
entering forwarding state
> 	Sep  8 15:44:31 rootsrv01 kernel: [   74.500111] VLAN40: port 1(vif2.5)
entering forwarding state
> 	Sep  8 15:44:34 rootsrv01 kernel: [   77.317431] VLAN20: port 1(vif2.3)
entering disabled state
> 	Sep  8 15:44:34 rootsrv01 kernel: [   77.317490] VLAN20: port 1(vif2.3)
entering disabled state
> 	Sep  8 15:44:36 rootsrv01 kernel: [   79.368685] VLAN40: port 1(vif2.5)
entering disabled state
> 	Sep  8 15:44:36 rootsrv01 kernel: [   79.369156] VLAN40: port 1(vif2.5)
entering disabled state
> 	Sep  8 15:44:37 rootsrv01 kernel: Kernel logging (proc) stopped.
> 	Sep  8 15:44:37 rootsrv01 rsyslogd: [origin software="rsyslogd"
swVersion="4.6.4" x-pid="890"
x-info="http://www.rsyslog.com"] exiting on signal 15.
>   
> In the /var/log/daemong.log you can find this message:
>          Sep  8 15:44:37 rootsrv01 acpid: exiting
>          Sep  8 15:44:37 rootsrv01 rpc.statd[750]: Caught signal 15,
un-registering and exiting
All the above (both message and daemon.log) look like normal parts of
shutting down to me.
>          Sep  8 15:44:37 rootsrv01 udevd-work[2276]:
''/etc/xen/scripts/vif-setup offline type_if=vif'' unexpected
exit with status 0x000f
This might be worth following up on.

I would do this by adding near the top of vif-setup and/or vif-bridge
(or whichever script you use):
	exec 1>>/var/log/vif-setup.log
	exec 2>&1

I would then also annotate all through vif-bridge in the offline path
with echo statements showing how far it got and what command was to be
run next.

Ian.

Maik Brauer

2012-Sep-10 15:00 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Sep 10, 2012, at 10:39 AM, Ian Campbell wrote:
> On Sat, 2012-09-08 at 15:50 +0100, Maik Brauer wrote:
>> On Sep 4, 2012, at 10:11 AM, Ian Campbell wrote:
>> 
>>> Could you not top post please, it makes it rather hard to follow
the
>>> flow of the conversation.
>>> On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote:
>>>> As stated, you can alias shutdown to do exactly what you need,
it can
>>>> be as simple as a series of hard-coded operations to a complex
custom
>>>> shell script that parses your domains and closes each with
feedback.
>>> 
>>> Xen ships the "xendomains" initscript which can halt
guest on shutdown
>>> as well as automatically start specific guests on boot. It can also
be
>>> configured to suspend/resume them or (I think) migrate them away.
>>> 
>>> For diagnosing the crash itself more details will be required than
were
>>> provided in the original post. Please see
>>> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for some
guidance.
>>> At a minimum we would need a capture (serial console or photo) of
the
>>> crash backtrace.
>>> 
>>> Ian.
>>> 
>>> 
>>  I found out that it hangs during re-boot of dom0 when having more
>> Network interfaces involved, like:
>>      vif = [ ''mac=06:46:AB:CC:11:01,
ip=<myIPadress>'', '''', '''',
>> ''mac=06:04:AB:BB:11:03, bridge=VLAN20,
script=vif-bridge'', '''',
>> ''mac=06:04:AB:BB:11:05, bridge=VLAN40,
script=vif-bridge'' ]
> 
> 6 interfaces total, 3 of which have a random mac on each reboot and all
> get put on the default bridge?
No, not really. The bridge is different for each interface. We have VLAN20,
VLAN40, etc as bridges.
These one will be created as well at the beginning when the system starts up
(create_bridges.sh):
/usr/sbin/brctl addbr VLAN11
/usr/sbin/brctl addbr VLAN12
/usr/sbin/brctl addbr VLAN20
/usr/sbin/brctl addbr VLAN30
/usr/sbin/brctl addbr VLAN40

/sbin/ifconfig VLAN11 down -arp up
/sbin/ifconfig VLAN12 down -arp up
/sbin/ifconfig VLAN20 down -arp up
/sbin/ifconfig VLAN30 down -arp up
/sbin/ifconfig VLAN40 down -arp up> 
> Is your default script vif-bridge or something else? Have you modified
> any of these scripts?
No I didn''t modify anything. Still the original
script.> 
>>  in case you use just one or having the basic line in place, it is
>> working:  
>>      vif = [ '''' ]
>> 
>>  The system stops after initiating the reboot at the following line in
the console: System will restart...........
> 
> So this is a hang, not a crash as suggested originally?
Yes, you are right. It is just a hang.> 
> If it is a hang then you might have some luck using hte magic sysrq keys
> to print lists of blocked tasks. I''m not sure in Squeeze but you
might
> need to enable this as described in Documentation/sysrq.txt in the Linux
> source.
> 
> Blocked tasks are listed with SysRQ-''w''. If you have
serial console then
> ''t'' will list all task, but that list can be quite long
so it is useless
> without a serial console.
List is empty. SysRQ -w and SysRQ-t shows nothing at all. There is nothing
running anymore.
It shows periodically:  INFO: task xenwatch:12 blocked for more than 120 seconds
Seems that the xenwatch is blocking the reboot here, is that assumption correct?
But strange enough that I can''t
see any process anymore with the SysRQ -t or SysRQ -w
>>  In the Logfile of /var/log/message you can find this as the last line:
>>        Sep  8 15:44:28 rootsrv01 shutdown[2445]: shutting down for
system reboot
>> 	Sep  8 15:44:31 rootsrv01 kernel: [   73.716246] VLAN20: port
1(vif2.3) entering forwarding state
>> 	Sep  8 15:44:31 rootsrv01 kernel: [   74.500111] VLAN40: port
1(vif2.5) entering forwarding state
>> 	Sep  8 15:44:34 rootsrv01 kernel: [   77.317431] VLAN20: port
1(vif2.3) entering disabled state
>> 	Sep  8 15:44:34 rootsrv01 kernel: [   77.317490] VLAN20: port
1(vif2.3) entering disabled state
>> 	Sep  8 15:44:36 rootsrv01 kernel: [   79.368685] VLAN40: port
1(vif2.5) entering disabled state
>> 	Sep  8 15:44:36 rootsrv01 kernel: [   79.369156] VLAN40: port
1(vif2.5) entering disabled state
>> 	Sep  8 15:44:37 rootsrv01 kernel: Kernel logging (proc) stopped.
>> 	Sep  8 15:44:37 rootsrv01 rsyslogd: [origin
software="rsyslogd" swVersion="4.6.4" x-pid="890"
x-info="http://www.rsyslog.com"] exiting on signal 15.
>> 
>> In the /var/log/daemong.log you can find this message:
>>         Sep  8 15:44:37 rootsrv01 acpid: exiting
>>         Sep  8 15:44:37 rootsrv01 rpc.statd[750]: Caught signal 15,
un-registering and exiting
> 
> All the above (both message and daemon.log) look like normal parts of
> shutting down to me.
> 
>>         Sep  8 15:44:37 rootsrv01 udevd-work[2276]:
''/etc/xen/scripts/vif-setup offline type_if=vif'' unexpected
exit with status 0x000f
> 
> This might be worth following up on.
When putting a "sleep 5" in stop section of the
/etc/init.d/xendomains:
case "$1" in
    start)
        start
        rc_status
        if test -f $LOCKFILE; then rc_status -v; fi
        ;;

    stop)
        stop
        rc_status -v
        sleep 5
        ;;

then the system shuts down as expected and is rebooting properly.
In the daemon.log file I couldn''t find the error: Sep  8 15:44:37
rootsrv01 udevd-work[2276]: ''/etc/xen/scripts/vif-setup offline
type_if=vif'' unexpected exit with status 0x000f
anymore. It seems that it disappeared after putting a delay inside. Could it be
a race condition here during shutdown, with the udev-daemon??
> 
> I would do this by adding near the top of vif-setup and/or vif-bridge
> (or whichever script you use):
> 	exec 1>>/var/log/vif-setup.log
> 	exec 2>&1
> 
> I would then also annotate all through vif-bridge in the offline path
> with echo statements showing how far it got and what command was to be
> run next.
> 
> Ian.
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xen.org
> http://lists.xen.org/xen-users

Ian Campbell

2012-Sep-10 15:10 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Mon, 2012-09-10 at 16:00 +0100, Maik Brauer wrote:> On Sep 10, 2012, at 10:39 AM, Ian Campbell wrote:
> 
> > On Sat, 2012-09-08 at 15:50 +0100, Maik Brauer wrote:
> >> On Sep 4, 2012, at 10:11 AM, Ian Campbell wrote:
> >> 
> >>> Could you not top post please, it makes it rather hard to
follow the
> >>> flow of the conversation.
> >>> On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote:
> >>>> As stated, you can alias shutdown to do exactly what you
need, it can
> >>>> be as simple as a series of hard-coded operations to a
complex custom
> >>>> shell script that parses your domains and closes each with
feedback.
> >>> 
> >>> Xen ships the "xendomains" initscript which can halt
guest on shutdown
> >>> as well as automatically start specific guests on boot. It can
also be
> >>> configured to suspend/resume them or (I think) migrate them
away.
> >>> 
> >>> For diagnosing the crash itself more details will be required
than were
> >>> provided in the original post. Please see
> >>> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for some
guidance.
> >>> At a minimum we would need a capture (serial console or photo)
of the
> >>> crash backtrace.
> >>> 
> >>> Ian.
> >>> 
> >>> 
> >>  I found out that it hangs during re-boot of dom0 when having more
> >> Network interfaces involved, like:
> >>      vif = [ ''mac=06:46:AB:CC:11:01,
ip=<myIPadress>'', '''', '''',
> >> ''mac=06:04:AB:BB:11:03, bridge=VLAN20,
script=vif-bridge'', '''',
> >> ''mac=06:04:AB:BB:11:05, bridge=VLAN40,
script=vif-bridge'' ]
> > 
> > 6 interfaces total, 3 of which have a random mac on each reboot and
all
> > get put on the default bridge?
> 
> No, not really. The bridge is different for each interface.
You have three lots of '''' which will all go onto the same
bridge AFAICT
(whichever one is determined to be the default)
> > If it is a hang then you might have some luck using hte magic sysrq
keys
> > to print lists of blocked tasks. I''m not sure in Squeeze but
you might
> > need to enable this as described in Documentation/sysrq.txt in the
Linux
> > source.
> > 
> > Blocked tasks are listed with SysRQ-''w''. If you have
serial console then
> > ''t'' will list all task, but that list can be quite
long so it is useless
> > without a serial console.
> 
> List is empty. SysRQ -w and SysRQ-t shows nothing at all.
You might need to increase the log verbosity with SysRQ-9 first?
>  There is nothing running anymore.
> It shows periodically:  INFO: task xenwatch:12 blocked for more than 120
seconds
What is the very last thing printed before this?
> Seems that the xenwatch is blocking the reboot here, is that assumption
correct? But strange enough that I can''t
> see any process anymore with the SysRQ -t or SysRQ -w
The xenwatch thread ought to count as a process for at least the
purposes of SysRQ-t if not -w.
> 
> >>  In the Logfile of /var/log/message you can find this as the last
line:
> >>        Sep  8 15:44:28 rootsrv01 shutdown[2445]: shutting down for
system reboot
> >> 	Sep  8 15:44:31 rootsrv01 kernel: [   73.716246] VLAN20: port
1(vif2.3) entering forwarding state
> >> 	Sep  8 15:44:31 rootsrv01 kernel: [   74.500111] VLAN40: port
1(vif2.5) entering forwarding state
> >> 	Sep  8 15:44:34 rootsrv01 kernel: [   77.317431] VLAN20: port
1(vif2.3) entering disabled state
> >> 	Sep  8 15:44:34 rootsrv01 kernel: [   77.317490] VLAN20: port
1(vif2.3) entering disabled state
> >> 	Sep  8 15:44:36 rootsrv01 kernel: [   79.368685] VLAN40: port
1(vif2.5) entering disabled state
> >> 	Sep  8 15:44:36 rootsrv01 kernel: [   79.369156] VLAN40: port
1(vif2.5) entering disabled state
> >> 	Sep  8 15:44:37 rootsrv01 kernel: Kernel logging (proc) stopped.
> >> 	Sep  8 15:44:37 rootsrv01 rsyslogd: [origin
software="rsyslogd" swVersion="4.6.4" x-pid="890"
x-info="http://www.rsyslog.com"] exiting on signal 15.
> >> 
> >> In the /var/log/daemong.log you can find this message:
> >>         Sep  8 15:44:37 rootsrv01 acpid: exiting
> >>         Sep  8 15:44:37 rootsrv01 rpc.statd[750]: Caught signal
15, un-registering and exiting
> > 
> > All the above (both message and daemon.log) look like normal parts of
> > shutting down to me.
> > 
> >>         Sep  8 15:44:37 rootsrv01 udevd-work[2276]:
''/etc/xen/scripts/vif-setup offline type_if=vif'' unexpected
exit with status 0x000f
> > 
> > This might be worth following up on.
> 
> When putting a "sleep 5" in stop section of the
/etc/init.d/xendomains:
> case "$1" in
>     start)
>         start
>         rc_status
>         if test -f $LOCKFILE; then rc_status -v; fi
>         ;;
> 
>     stop)
>         stop
>         rc_status -v
>         sleep 5
>         ;;
> 
> then the system shuts down as expected and is rebooting properly.
> In the daemon.log file I couldn''t find the error: Sep  8 15:44:37
rootsrv01 udevd-work[2276]: ''/etc/xen/scripts/vif-setup offline
type_if=vif'' unexpected exit with status 0x000f
> anymore. It seems that it disappeared after putting a delay inside. Could
it be a race condition here during shutdown, with the udev-daemon??
It could be a race with the guests actually shuting down vs the rest of
the initscripts running.

Really the initscript ought to wait, the default at least with the
script shipped with xen is to do so, by using shutdown --wait. can you
confirm whether or not this is happening for you?

Possibly someone is trying to talk to xenstore after xenstored has
exited -- I expect that would cause the sorts of blocked for 120
messages you are seeing.

Maik Brauer

2012-Sep-11 22:46 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Sep 10, 2012, at 5:10 PM, Ian Campbell wrote:
> On Mon, 2012-09-10 at 16:00 +0100, Maik Brauer wrote:
>> On Sep 10, 2012, at 10:39 AM, Ian Campbell wrote:
>> 
>>> On Sat, 2012-09-08 at 15:50 +0100, Maik Brauer wrote:
>>>> On Sep 4, 2012, at 10:11 AM, Ian Campbell wrote:
>>>> 
>>>>> Could you not top post please, it makes it rather hard to
follow the
>>>>> flow of the conversation.
>>>>> On Mon, 2012-09-03 at 18:10 +0100, Casey DeLorme wrote:
>>>>>> As stated, you can alias shutdown to do exactly what
you need, it can
>>>>>> be as simple as a series of hard-coded operations to a
complex custom
>>>>>> shell script that parses your domains and closes each
with feedback.
>>>>> 
>>>>> Xen ships the "xendomains" initscript which can
halt guest on shutdown
>>>>> as well as automatically start specific guests on boot. It
can also be
>>>>> configured to suspend/resume them or (I think) migrate them
away.
>>>>> 
>>>>> For diagnosing the crash itself more details will be
required than were
>>>>> provided in the original post. Please see
>>>>> http://wiki.xen.org/wiki/Reporting_Bugs_against_Xen for
some guidance.
>>>>> At a minimum we would need a capture (serial console or
photo) of the
>>>>> crash backtrace.
>>>>> 
>>>>> Ian.
>>>>> 
>>>>> 
>>>> I found out that it hangs during re-boot of dom0 when having
more
>>>> Network interfaces involved, like:
>>>>     vif = [ ''mac=06:46:AB:CC:11:01,
ip=<myIPadress>'', '''', '''',
>>>> ''mac=06:04:AB:BB:11:03, bridge=VLAN20,
script=vif-bridge'', '''',
>>>> ''mac=06:04:AB:BB:11:05, bridge=VLAN40,
script=vif-bridge'' ]
>>> 
>>> 6 interfaces total, 3 of which have a random mac on each reboot and
all
>>> get put on the default bridge?
>> 
>> No, not really. The bridge is different for each interface.
> 
> You have three lots of '''' which will all go onto the same
bridge AFAICT
> (whichever one is determined to be the default)
That is right. As long as I put nothing inside that it should be a different
script to execute, it will use default for
''''> 
>>> If it is a hang then you might have some luck using hte magic sysrq
keys
>>> to print lists of blocked tasks. I''m not sure in Squeeze
but you might
>>> need to enable this as described in Documentation/sysrq.txt in the
Linux
>>> source.
>>> 
>>> Blocked tasks are listed with SysRQ-''w''. If you
have serial console then
>>> ''t'' will list all task, but that list can be
quite long so it is useless
>>> without a serial console.
>> 
>> List is empty. SysRQ -w and SysRQ-t shows nothing at all.
> 
> You might need to increase the log verbosity with SysRQ-9 first?
I did and now I got more Information. But due to the amount of data which slips
over the console screen I am not able
to record properly. Can you advice what to do here?> 
>> There is nothing running anymore.
>> It shows periodically:  INFO: task xenwatch:12 blocked for more than
120 seconds
> 
> What is the very last thing printed before this?
There is nothing before. Just that message pops up
periodically.> 
>> Seems that the xenwatch is blocking the reboot here, is that assumption
correct? But strange enough that I can''t
>> see any process anymore with the SysRQ -t or SysRQ -w
> 
> The xenwatch thread ought to count as a process for at least the
> purposes of SysRQ-t if not -w.
Could be, but due to the amount it slips over the screen, that I am not able to
read it line by line.
Please advice a procedure to record.> 
>> 
>>>> In the Logfile of /var/log/message you can find this as the
last line:
>>>>       Sep  8 15:44:28 rootsrv01 shutdown[2445]: shutting down
for system reboot
>>>> 	Sep  8 15:44:31 rootsrv01 kernel: [   73.716246] VLAN20: port
1(vif2.3) entering forwarding state
>>>> 	Sep  8 15:44:31 rootsrv01 kernel: [   74.500111] VLAN40: port
1(vif2.5) entering forwarding state
>>>> 	Sep  8 15:44:34 rootsrv01 kernel: [   77.317431] VLAN20: port
1(vif2.3) entering disabled state
>>>> 	Sep  8 15:44:34 rootsrv01 kernel: [   77.317490] VLAN20: port
1(vif2.3) entering disabled state
>>>> 	Sep  8 15:44:36 rootsrv01 kernel: [   79.368685] VLAN40: port
1(vif2.5) entering disabled state
>>>> 	Sep  8 15:44:36 rootsrv01 kernel: [   79.369156] VLAN40: port
1(vif2.5) entering disabled state
>>>> 	Sep  8 15:44:37 rootsrv01 kernel: Kernel logging (proc)
stopped.
>>>> 	Sep  8 15:44:37 rootsrv01 rsyslogd: [origin
software="rsyslogd" swVersion="4.6.4" x-pid="890"
x-info="http://www.rsyslog.com"] exiting on signal 15.
>>>> 
>>>> In the /var/log/daemong.log you can find this message:
>>>>        Sep  8 15:44:37 rootsrv01 acpid: exiting
>>>>        Sep  8 15:44:37 rootsrv01 rpc.statd[750]: Caught signal
15, un-registering and exiting
>>> 
>>> All the above (both message and daemon.log) look like normal parts
of
>>> shutting down to me.
>>> 
>>>>        Sep  8 15:44:37 rootsrv01 udevd-work[2276]:
''/etc/xen/scripts/vif-setup offline type_if=vif'' unexpected
exit with status 0x000f
>>> 
>>> This might be worth following up on.
>> 
>> When putting a "sleep 5" in stop section of the
/etc/init.d/xendomains:
>> case "$1" in
>>    start)
>>        start
>>        rc_status
>>        if test -f $LOCKFILE; then rc_status -v; fi
>>        ;;
>> 
>>    stop)
>>        stop
>>        rc_status -v
>>        sleep 5
>>        ;;
>> 
>> then the system shuts down as expected and is rebooting properly.
>> In the daemon.log file I couldn''t find the error: Sep  8
15:44:37 rootsrv01 udevd-work[2276]: ''/etc/xen/scripts/vif-setup
offline type_if=vif'' unexpected exit with status 0x000f
>> anymore. It seems that it disappeared after putting a delay inside.
Could it be a race condition here during shutdown, with the udev-daemon??
> 
> It could be a race with the guests actually shuting down vs the rest of
> the initscripts running.
> 
> Really the initscript ought to wait, the default at least with the
> script shipped with xen is to do so, by using shutdown --wait. can you
> confirm whether or not this is happening for you?
At least I can see that the shutdown --wait is in the scripts. So it seems that
the init script is waiting.
But independent from that, something must be still in use. Which block the
reboot process.> 
> Possibly someone is trying to talk to xenstore after xenstored has
> exited -- I expect that would cause the sorts of blocked for 120
> messages you are seeing.
> Could be, but we need to find out what is blocking the shutdown. I do not know
what else I can do in order to measure and collect
data for investigation. Let me know what else I can do? You can easiliy
reproduce this issue, when using more that 3 Network devices.
I installed that now on several machines at home and I have on all the same
issue when using more than 2-3 network Interfaces.> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xen.org
> http://lists.xen.org/xen-users

Ian Campbell

2012-Sep-12 08:13 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote:> >>>> I found out that it hangs during re-boot of dom0 when
having more
> >>>> Network interfaces involved, like:
> >>>>     vif = [ ''mac=06:46:AB:CC:11:01,
ip=<myIPadress>'', '''', '''',
> >>>> ''mac=06:04:AB:BB:11:03, bridge=VLAN20,
script=vif-bridge'', '''',
> >>>> ''mac=06:04:AB:BB:11:05, bridge=VLAN40,
script=vif-bridge'' ]
> >>> 
> >>> 6 interfaces total, 3 of which have a random mac on each
reboot and all
> >>> get put on the default bridge?
> >> 
> >> No, not really. The bridge is different for each interface.
> > 
> > You have three lots of '''' which will all go onto the
same bridge AFAICT
> > (whichever one is determined to be the default)
> 
> That is right. As long as I put nothing inside that it should be a
> different script to execute, it will use default for ''''
The default is "vif-bridge". Have you changed the default?

If not then your configuration as shown will put three interfaces on the
*same* bridge. Is this really what you want?

You claim above that the bridge is different for each interface, but
unless you have changed something somewhere then this is not the case.
Since you are having problems it is important to identify everything
which you have changed from the defaults.
> >> List is empty. SysRQ -w and SysRQ-t shows nothing at all.
> > 
> > You might need to increase the log verbosity with SysRQ-9 first?
> 
> I did and now I got more Information. But due to the amount of data which
slips over the console screen I am not able
> to record properly. Can you advice what to do here?
Like I said "that list can be quite long so it is useless
without a serial console": http://wiki.xen.org/wiki/Xen_Serial_Console

Depending on your distro you might also find this info in the logs
under /var/log somewhere.
> > 
> >> There is nothing running anymore.
> >> It shows periodically:  INFO: task xenwatch:12 blocked for more
than 120 seconds
> > 
> > What is the very last thing printed before this?
> 
> There is nothing before.
So the output is silent from boot until this message comes up? That
seems unlikely, since there should be plenty of messages from the
shutdown process itself if nothing else.

What is the last message one the screen before this one? In fact what is
the entire last screenfull of output?
> > Really the initscript ought to wait, the default at least with the
> > script shipped with xen is to do so, by using shutdown --wait. can you
> > confirm whether or not this is happening for you?
> 
> At least I can see that the shutdown --wait is in the scripts. So it seems
that the init script is waiting.
> But independent from that, something must be still in use. Which block the
reboot process.
> > 
> > Possibly someone is trying to talk to xenstore after xenstored has
> > exited -- I expect that would cause the sorts of blocked for 120
> > messages you are seeing.
> > 
> Could be, but we need to find out what is blocking the shutdown. I do not
know what else I can do in order to measure and collect
> data for investigation.
Did you add debugging to the hotplug scripts like I suggested a couple
of mails back?

If you run the xendomains script by hand and then *immediately* after it
exits run "xl list" have the domains actually gone? You could even
stick
some calls to xl list into the script itself and verify that the domains
are indeed shutting down as expected.

BTW Are you using xl or xend?

Ian.

Ian Campbell

2012-Sep-12 10:23 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote:> You can easiliy reproduce this issue, when using more that 3 Network
> devices.
I installed a PV guest on Debian Wheezy using 5 vifs all on the same
bridge and could not reproduce this, the domain was successfully
shutdown on reboot.

I also replaced the Debian xendomains script with the one from
xen-4.1-testing (since Debian''s differs). Still no repro.

Ian.

Maik Brauer

2012-Sep-12 20:47 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Sep 12, 2012, at 12:23 PM, Ian Campbell wrote:
> On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote:
>> You can easiliy reproduce this issue, when using more that 3 Network
>> devices.
> 
> I installed a PV guest on Debian Wheezy using 5 vifs all on the same
> bridge and could not reproduce this, the domain was successfully
> shutdown on reboot.
> 
> I also replaced the Debian xendomains script with the one from
> xen-4.1-testing (since Debian''s differs). Still no repro.
> Hi Ian, please try it with the stable squeeze one. Because I am currently not
using an unstable version.
I have customers running on that machines so I can''t experiment. Can
you please try with Debian 64bit?

> Ian.
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xen.org
> http://lists.xen.org/xen-users

Maik Brauer

2012-Sep-12 21:31 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Sep 12, 2012, at 10:13 AM, Ian Campbell wrote:
> On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote:
>>>>>> I found out that it hangs during re-boot of dom0 when
having more
>>>>>> Network interfaces involved, like:
>>>>>>    vif = [ ''mac=06:46:AB:CC:11:01,
ip=<myIPadress>'', '''', '''',
>>>>>> ''mac=06:04:AB:BB:11:03, bridge=VLAN20,
script=vif-bridge'', '''',
>>>>>> ''mac=06:04:AB:BB:11:05, bridge=VLAN40,
script=vif-bridge'' ]
>>>>> 
>>>>> 6 interfaces total, 3 of which have a random mac on each
reboot and all
>>>>> get put on the default bridge?
>>>> 
>>>> No, not really. The bridge is different for each interface.
>>> 
>>> You have three lots of '''' which will all go onto
the same bridge AFAICT
>>> (whichever one is determined to be the default)
>> 
>> That is right. As long as I put nothing inside that it should be a
>> different script to execute, it will use default for
''''
> 
> The default is "vif-bridge". Have you changed the default?
No I did not change this default script. Everything is at is has been delivered
in the Xen Source package.> 
> If not then your configuration as shown will put three interfaces on the
> *same* bridge. Is this really what you want?
No, because it will "not" put everything on the same bridge, because
the default setting is "routed mode" due to the fact that
my providers network configuration has changed the routing. Therefore in
xend-config.sxp we have the following disabled:
#(network-script network-route)
#(vif-script     vif-route)

and the next one enabled:
(network-script network-route)
(vif-script     vif-route)

So basically you can see them as placeholder for the eth1, eth2 and eth4 devices
in the Guest domU.
For the other 2 interfaces it is different. They should be bridged (different
from default). Therefore I have to
put the "script=vif-bridge" in the config as shown above. See below
the output of brctl show:
bridge name	bridge id		STP enabled	interfaces
VLAN11		8000.000000000000	no		
VLAN12		8000.000000000000	no		
VLAN20		8000.feffffffffff	no		vif2.3
VLAN30		8000.000000000000	no		
VLAN40		8000.feffffffffff	no		vif2.5
> 
> You claim above that the bridge is different for each interface, but
> unless you have changed something somewhere then this is not the case.
> Since you are having problems it is important to identify everything
> which you have changed from the defaults.
No, I am saying that the bridge name is different. Not that the script is
different. I am just creating isolated bridges
VLAN20, VLAN30, VLAN40, and so on in order to connect special network interfaces
together from different domU''s.> 
>>>> List is empty. SysRQ -w and SysRQ-t shows nothing at all.
>>> 
>>> You might need to increase the log verbosity with SysRQ-9 first?
>> 
>> I did and now I got more Information. But due to the amount of data
which slips over the console screen I am not able
>> to record properly. Can you advice what to do here?
> 
> Like I said "that list can be quite long so it is useless
> without a serial console": http://wiki.xen.org/wiki/Xen_Serial_Console
This will be a challenge.> 
> Depending on your distro you might also find this info in the logs
> under /var/log somewhere.
> There is not really useful information available. It will not print the info we
need. I checked it already.>>> 
>>>> There is nothing running anymore.
>>>> It shows periodically:  INFO: task xenwatch:12 blocked for more
than 120 seconds
>>> 
>>> What is the very last thing printed before this?
>> 
>> There is nothing before.
> 
> So the output is silent from boot until this message comes up? That
> seems unlikely, since there should be plenty of messages from the
> shutdown process itself if nothing else.
Yes there a plenty of messages. Let me put some lines below:
Stopping NFS Daemon
Stopping portmap daemon.
Deconfiguring network interfaces
Listening on LPF/eth0/00:1c:42:77:7a:29
Sending on LPF/eth0/00:1c:42:77:7a:29
DHCPRELEASE on eth0 
Cleaning up ifdown.
Saving system clock.
Deactivating  swap.
Will now restart.
[  720.213710] INFO: task xenwatch:12 blocked for more than 120 seconds
[  720.213.753] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message
[  840.212.745] INFO: task reboot:3347 blocked for more than 120 seconds
[  840.212.785] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
disables this message

(last INFO messages will repeat infinitely)
> 
> What is the last message one the screen before this one? In fact what is
> the entire last screenfull of output?
> See above.
>>> Really the initscript ought to wait, the default at least with the
>>> script shipped with xen is to do so, by using shutdown --wait. can
you
>>> confirm whether or not this is happening for you?
>> 
>> At least I can see that the shutdown --wait is in the scripts. So it
seems that the init script is waiting.
>> But independent from that, something must be still in use. Which block
the reboot process.
>>> 
>>> Possibly someone is trying to talk to xenstore after xenstored has
>>> exited -- I expect that would cause the sorts of blocked for 120
>>> messages you are seeing.
>>> 
>> Could be, but we need to find out what is blocking the shutdown. I do
not know what else I can do in order to measure and collect
>> data for investigation.
> 
> Did you add debugging to the hotplug scripts like I suggested a couple
> of mails back?
No I didn''t up to now.> 
> If you run the xendomains script by hand and then *immediately* after it
> exits run "xl list" have the domains actually gone? You could
even stick
> some calls to xl list into the script itself and verify that the domains
> are indeed shutting down as expected.
root@xenserver:/etc/xen/scripts# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   880     1     r-----     82.4
dnssrv01-v6                                  3   128     1     -b----      4.5
root@xenserver:/etc/xen/scripts# /etc/init.d/xendomains stop
Shutting down Xen domains: dnssrv01-v6(save)...
[done].
root@xenserver:/etc/xen/scripts# xm list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0   880     1     r-----     89.4
root@xenserver:/etc/xen/scripts# > 
> BTW Are you using xl or xend?
I am using xend.> 
> Ian.
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xen.org
> http://lists.xen.org/xen-users

Ian Campbell

2012-Sep-13 05:38 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Wed, 2012-09-12 at 21:47 +0100, Maik Brauer wrote:> On Sep 12, 2012, at 12:23 PM, Ian Campbell wrote:
> 
> > On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote:
> >> You can easiliy reproduce this issue, when using more that 3
Network
> >> devices.
> > 
> > I installed a PV guest on Debian Wheezy using 5 vifs all on the same
> > bridge and could not reproduce this, the domain was successfully
> > shutdown on reboot.
> > 
> > I also replaced the Debian xendomains script with the one from
> > xen-4.1-testing (since Debian''s differs). Still no repro.
> > 
> Hi Ian, please try it with the stable squeeze one. Because I am currently
not using an unstable version.
> I have customers running on that machines so I can''t experiment.
Can you please try with Debian 64bit?
In your initial mail you said you were running a 4.1.3 hypervisor.
Squeeze has 4.0. Which is it?

Ian.
> 
> 
> > Ian.
> > 
> > 
> > _______________________________________________
> > Xen-users mailing list
> > Xen-users@lists.xen.org
> > http://lists.xen.org/xen-users
> 
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xen.org
> http://lists.xen.org/xen-users

Ian Campbell

2012-Sep-13 05:51 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Wed, 2012-09-12 at 21:47 +0100, Maik Brauer wrote:> On Sep 12, 2012, at 12:23 PM, Ian Campbell wrote:
> 
> > On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote:
> >> You can easiliy reproduce this issue, when using more that 3
Network
> >> devices.
> > 
> > I installed a PV guest on Debian Wheezy using 5 vifs all on the same
> > bridge and could not reproduce this, the domain was successfully
> > shutdown on reboot.
> > 
> > I also replaced the Debian xendomains script with the one from
> > xen-4.1-testing (since Debian''s differs). Still no repro.
> > 
> Hi Ian, please try it with the stable squeeze one. Because I am currently
not using an unstable version.
> I have customers running on that machines so I can''t experiment.
What about one of the several machines you have installed at home which
show this issue?

Ian Campbell

2012-Sep-13 05:51 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Wed, 2012-09-12 at 22:31 +0100, Maik Brauer wrote:> On Sep 12, 2012, at 10:13 AM, Ian Campbell wrote:
> > The default is "vif-bridge". Have you changed the default?
> 
> No I did not change this default script. Everything is at is has been
delivered in the Xen Source package.
No, it isn''t, you say below that you have changed xend-config.sxp
> > 
> > If not then your configuration as shown will put three interfaces on
the
> > *same* bridge. Is this really what you want?
> 
> No, because it will "not" put everything on the same bridge,
because the default setting is "routed mode" due to the fact that
> my providers network configuration has changed the routing. Therefore in
xend-config.sxp we have the following disabled:
> #(network-script network-route)
> #(vif-script     vif-route)
> 
> and the next one enabled:
> (network-script network-route)
> (vif-script     vif-route)
So you have changed the default then, haven''t you! You have edited
xend-config.sxp to change the default vif-script and network-script!

What else have you changed from the defaults?
> So basically you can see them as placeholder for the eth1, eth2 and eth4
devices in the Guest domU.
> For the other 2 interfaces it is different. They should be bridged
(different from default).
Do you mean routed here? Do you understand the difference between
routing and bridging?
>  Therefore I have to
> put the "script=vif-bridge" in the config as shown above. See
below the output of brctl show:
> bridge name	bridge id		STP enabled	interfaces
> VLAN11		8000.000000000000	no		
> VLAN12		8000.000000000000	no		
> VLAN20		8000.feffffffffff	no		vif2.3
> VLAN30		8000.000000000000	no		
> VLAN40		8000.feffffffffff	no		vif2.5
> 
> > 
> > You claim above that the bridge is different for each interface, but
> > unless you have changed something somewhere then this is not the case.
> > Since you are having problems it is important to identify everything
> > which you have changed from the defaults.
> 
> No, I am saying that the bridge name is different. Not that the script is
different. I am just creating isolated bridges
> VLAN20, VLAN30, VLAN40, and so on in order to connect special network
interfaces together from different domU''s.
Except it turns out that half your interfaces aren''t even using
bridging
and aren''t using vif-bridge at all!

Please, it is important to give all the facts and to be precise when you
are asking people to debug a remote system.
> >>>> List is empty. SysRQ -w and SysRQ-t shows nothing at all.
> >>> 
> >>> You might need to increase the log verbosity with SysRQ-9
first?
> >> 
> >> I did and now I got more Information. But due to the amount of
data
> which slips over the console screen I am not able
> >> to record properly. Can you advice what to do here?
Did you try SysRQ-w -- given the point at which your logs show the hang
this should provide a much smaller amount of output than Sysrq-l and be
much more manageable. In particular it will be useful to know what the
"xenwatch" and "reboot" processes are waiting for.
> > Did you add debugging to the hotplug scripts like I suggested a couple
> > of mails back?
> 
> No I didn''t up to now.
Please do. For both the vif-bridge and vif-route scripts.

Ian.

Maik Brauer

2012-Sep-13 08:54 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Sep 13, 2012, at 7:38 AM, Ian Campbell wrote:
> On Wed, 2012-09-12 at 21:47 +0100, Maik Brauer wrote:
>> On Sep 12, 2012, at 12:23 PM, Ian Campbell wrote:
>> 
>>> On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote:
>>>> You can easiliy reproduce this issue, when using more that 3
Network
>>>> devices.
>>> 
>>> I installed a PV guest on Debian Wheezy using 5 vifs all on the
same
>>> bridge and could not reproduce this, the domain was successfully
>>> shutdown on reboot.
>>> 
>>> I also replaced the Debian xendomains script with the one from
>>> xen-4.1-testing (since Debian''s differs). Still no repro.
>>> 
>> Hi Ian, please try it with the stable squeeze one. Because I am
currently not using an unstable version.
>> I have customers running on that machines so I can''t
experiment. Can you please try with Debian 64bit?
> 
> In your initial mail you said you were running a 4.1.3 hypervisor.
> Squeeze has 4.0. Which is it?
I am using Debian 5.0.7 (SQUEEZE), with the 4.1.3 Hypervisor and
2.6.32-5-xen-amd64 kernel.
I am not using the Standard Packaged Hypervisor (4.0) shipped with Debian
SQUEEZE.
> 
> Ian.
> 
>> 
>> 
>>> Ian.
>>> 
>>> 
>>> _______________________________________________
>>> Xen-users mailing list
>>> Xen-users@lists.xen.org
>>> http://lists.xen.org/xen-users
>> 
>> 
>> 
>> _______________________________________________
>> Xen-users mailing list
>> Xen-users@lists.xen.org
>> http://lists.xen.org/xen-users
> 
> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xen.org
> http://lists.xen.org/xen-users

Maik Brauer

2012-Sep-13 08:55 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Sep 13, 2012, at 7:51 AM, Ian Campbell wrote:
> On Wed, 2012-09-12 at 21:47 +0100, Maik Brauer wrote:
>> On Sep 12, 2012, at 12:23 PM, Ian Campbell wrote:
>> 
>>> On Tue, 2012-09-11 at 23:46 +0100, Maik Brauer wrote:
>>>> You can easiliy reproduce this issue, when using more that 3
Network
>>>> devices.
>>> 
>>> I installed a PV guest on Debian Wheezy using 5 vifs all on the
same
>>> bridge and could not reproduce this, the domain was successfully
>>> shutdown on reboot.
>>> 
>>> I also replaced the Debian xendomains script with the one from
>>> xen-4.1-testing (since Debian''s differs). Still no repro.
>>> 
>> Hi Ian, please try it with the stable squeeze one. Because I am
currently not using an unstable version.
>> I have customers running on that machines so I can''t
experiment.
> 
> What about one of the several machines you have installed at home which
> show this issue?
> I will try to install it on a WHEEZY machine for testing. But this will not help
me, because
at the end I need it working on Production machines. And I do not want to
install this WHEEZY in production.> 
> 
> _______________________________________________
> Xen-users mailing list
> Xen-users@lists.xen.org
> http://lists.xen.org/xen-users

Ian Campbell

2012-Sep-13 08:58 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Thu, 2012-09-13 at 09:55 +0100, Maik Brauer wrote:> I will try to install it on a WHEEZY machine for testing. But this will not
help me, because
> at the end I need it working on Production machines. And I do not want to
install this WHEEZY in production.
Why not install a test system with exactly the same software as you use
in production?

Maik Brauer

2012-Sep-13 09:09 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Sep 13, 2012, at 10:58 AM, Ian Campbell wrote:
> On Thu, 2012-09-13 at 09:55 +0100, Maik Brauer wrote:
>> I will try to install it on a WHEEZY machine for testing. But this will
not help me, because
>> at the end I need it working on Production machines. And I do not want
to install this WHEEZY in production.
> 
> Why not install a test system with exactly the same software as you use
> in production?
> Yes, as I said, I will install a test system with the software I want to use.
But even this is working afterwards, I will
not use this in production, because wheezy is still not
stable.>

Ian Campbell

2012-Sep-13 09:28 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Thu, 2012-09-13 at 10:09 +0100, Maik Brauer wrote:> On Sep 13, 2012, at 10:58 AM, Ian Campbell wrote:
> 
> > On Thu, 2012-09-13 at 09:55 +0100, Maik Brauer wrote:
> >> I will try to install it on a WHEEZY machine for testing. But this
will not help me, because
> >> at the end I need it working on Production machines. And I do not
want to install this WHEEZY in production.
> > 
> > Why not install a test system with exactly the same software as you
use
> > in production?
> > 
> Yes, as I said, I will install a test system with the software I want to
use. But even this is working afterwards, I will
> not use this in production, because wheezy is still not stable.
Why Wheezy? What we need here is a system which has the same software as
you use in production, which reproduces the issue and which you can play
with and experiment with as much as you like without disturbing your
customers.

I''m not asking you to install Wheezy here, although if you think it
will
help and you can reproduce the issue with that configuration then go
ahead.

Please just be sure to be very clear about which exact environment any
specific results you report were obtained in.

Ian.

Maik Brauer

2012-Sep-13 09:44 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Sep 13, 2012, at 11:28 AM, Ian Campbell wrote:
> On Thu, 2012-09-13 at 10:09 +0100, Maik Brauer wrote:
>> On Sep 13, 2012, at 10:58 AM, Ian Campbell wrote:
>> 
>>> On Thu, 2012-09-13 at 09:55 +0100, Maik Brauer wrote:
>>>> I will try to install it on a WHEEZY machine for testing. But
this will not help me, because
>>>> at the end I need it working on Production machines. And I do
not want to install this WHEEZY in production.
>>> 
>>> Why not install a test system with exactly the same software as you
use
>>> in production?
>>> 
>> Yes, as I said, I will install a test system with the software I want
to use. But even this is working afterwards, I will
>> not use this in production, because wheezy is still not stable.
> 
> Why Wheezy? What we need here is a system which has the same software as
> you use in production, which reproduces the issue and which you can play
> with and experiment with as much as you like without disturbing your
> customers.
> I have that already as mentioned in my last threads. What I can do, is that I
can
provide the VM on a webserver for you, where you can download the VM.
Than you have the environment where you can reproduce the issue. Is that OK for
you.
I am using Parallels for the Test-Setup System.
> I''m not asking you to install Wheezy here, although if you think
it will
> help and you can reproduce the issue with that configuration then go
> ahead.
> 
> Please just be sure to be very clear about which exact environment any
> specific results you report were obtained in.
> 
> Ian.
>

Ian Campbell

2012-Sep-13 09:51 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Thu, 2012-09-13 at 10:44 +0100, Maik Brauer wrote:> On Sep 13, 2012, at 11:28 AM, Ian Campbell wrote:
> 
> > On Thu, 2012-09-13 at 10:09 +0100, Maik Brauer wrote:
> >> On Sep 13, 2012, at 10:58 AM, Ian Campbell wrote:
> >> 
> >>> On Thu, 2012-09-13 at 09:55 +0100, Maik Brauer wrote:
> >>>> I will try to install it on a WHEEZY machine for testing.
But this will not help me, because
> >>>> at the end I need it working on Production machines. And I
do not want to install this WHEEZY in production.
> >>> 
> >>> Why not install a test system with exactly the same software
as you use
> >>> in production?
> >>> 
> >> Yes, as I said, I will install a test system with the software I
want to use. But even this is working afterwards, I will
> >> not use this in production, because wheezy is still not stable.
> > 
> > Why Wheezy? What we need here is a system which has the same software
as
> > you use in production, which reproduces the issue and which you can
play
> > with and experiment with as much as you like without disturbing your
> > customers.
> > 
> I have that already as mentioned in my last threads.
The reason we are having this sub thread is that you said:
"I have customers running on that machines so I can''t
experiment.".

If you have a test machine where you can reproduce the issue which you
can experiment with then please feel free to use it.
>  What I can do, is that I can
> provide the VM on a webserver for you, where you can download the VM.
> Than you have the environment where you can reproduce the issue. Is that OK
for you.
> I am using Parallels for the Test-Setup System.
I''m afraid I don''t have time any more to do anything other
than suggest
avenues for you to investigate and experiments for you to try.

Ian.

Maik Brauer

2012-Sep-27 20:58 UTC

head link

Re: Dom0 crashed when rebooting whilst DomU are running

On Sep 13, 2012, at 11:28 AM, Ian Campbell wrote:
> On Thu, 2012-09-13 at 10:09 +0100, Maik Brauer wrote:
>> On Sep 13, 2012, at 10:58 AM, Ian Campbell wrote:
>> 
>>> On Thu, 2012-09-13 at 09:55 +0100, Maik Brauer wrote:
>>>> I will try to install it on a WHEEZY machine for testing. But
this will not help me, because
>>>> at the end I need it working on Production machines. And I do
not want to install this WHEEZY in production.
>>> 
>>> Why not install a test system with exactly the same software as you
use
>>> in production?
>>> 
>> Yes, as I said, I will install a test system with the software I want
to use. But even this is working afterwards, I will
>> not use this in production, because wheezy is still not stable.
> 
> Why Wheezy? What we need here is a system which has the same software as
> you use in production, which reproduces the issue and which you can play
> with and experiment with as much as you like without disturbing your
> customers.
I tried it on several VM''s and Dedicated Servers with SQUEEZE and the
issue is still persistent.> 
> I''m not asking you to install Wheezy here, although if you think
it will
> help and you can reproduce the issue with that configuration then go
> ahead.When using the Back-ported Kernel in SQUEEZE  (vmlinuz-3.2.0-0.bpo.3-amd64)
there is no issue anymore.
So it seems that it is happening with the last SQUEEZE Kernel
(vmlinuz-2.6.32-5-xen-amd64).
This is definitely something in relation with this
kernel.> 
> Please just be sure to be very clear about which exact environment any
> specific results you report were obtained in.
> 
> Ian.
>

Xen users - Sep 2012 - Dom0 crashed when rebooting whilst DomU are running

Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running

Re: Dom0 crashed when rebooting whilst DomU are running